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COMPARING INDIVIDUAL MEANS IN THE 
ANALYSIS OF VARIANCE* 


JOHN W. TUKEY 


Princeton University 


The practitioner of the analysis of variance often wants to 
draw as many conclusions as are reasonable about the relation of 
the true means for individual “treatments,” and a statement by 
the F-test (or the z-test) that they are not all alike leaves him 
thoroughly unsatisfied. The problem of breaking up the treatment 
means into distinguishable groups has not been discussed at much 
length, the solutions given in the various textbooks differ and, 
what is more important, seem solely based on intuition. 

After discussing the problem on a basis combining intuition 
with some hard, cold facts about the distributions of certain test 
quantities (or “‘statistics’’) a simple and definite procedure is 
proposed for dividing treatments into distinguishable groups, and 
for determining that the treatments within some of these groups 
are different, although there is not enough evidence to say ‘‘which 
is which.”’ The procedure is illustrated on examples. 


2. DISCUSSION OF THE PROBLEM 


Le US BEGIN by considering how the latest and most advanced sta- 
tistical theory would approach this problem and then explain why 
such a solution seems impractical. To make things more precise, let us 
suppose as a fictitious example that seven varieties of buckwheat; 
A, B, C, D, E, F, and G have been tested for yield in each of 12 locations, 
and that our interest is in the average yield of the buckwheat varieties 
in a region of which the 12 locations are a respectable sample, and in 
years exactly like the one in which the experiment was made. We will 
then have a simple and straightforward analysis of variance into varie- 
ties, locations, and interaction. We shall be concerned with the seven 
observed variety means and with an unbiased estimate of their variance, 
which will be given by 1/12th of the interaction mean square, which is 
itself on 66 degrees of freedom. What can we say about the varieties 
under these conditions? 

We will wish to say, for example, that B and F yield better than 
A, C, and G, which yield better than D and HZ. Perhaps we might wish 
to add that A, C and G are not alike, although we do not know which one 


*Prepared in connection with research sponsored by the Office of Naval Research. 
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yields better. The most modern approach would require us to proceed 
as follows: Write down all the possible conclusions to which we might 
come—the one illustrated above is one of the 120,904 similar possibilities 
for seven “treatments.”’ Then for each combination of seven true mean 
yields we should decide how much it would “cost” us to make each of 
these 120,904 decisions. Making the usual assumptions about the 
distribution of fluctuations in yield, we would have begun to state a 
mathematically well-posed problem. We are unlikely to get this far in a 
practical problem in my lifetime! Then we find, to our horror, that there 
are many competing methods of decision, and that which one risks the 
least will depend on the true variety yields, which we will never know. 
The problem is not as hopeless as it sounds, for Wald has taken a large 
step forward, and shown that any decision method can be replaced by 
one derived from a priori probability considerations without increasing 
the risk under any set of true variety yields. This is a great simplifica- 
tion—but the mathematical complications of dealing with 120,904 
functions of seven variables are still awe-inspiring. If we were able to 
carry through this program—to set the risks intelligently, to carry out 
the mathematics, and to choose wisely among the admissible decision 
functions—we would surely do much better than we can hope to do now, 
but for the present we need to adopt a simpler procedure. (Note. The 
case of 3 or. 4 means has been attacked within the scope of Wald’s theory 
by Duncan [7] using a different philosophy which emphasizes con- 
clusions about pairs of means.) 

At a low and practical level, what do we wish to do? We wish to 
separate the varieties into distinguishable groups, as often as we can 
without too frequently separating varieties which should stay together. 
Our criterion of “not too frequently” is a rough one, and may frequently 
be expressed by saying ‘‘at the 5% level” or “at the 1% level.” The 
meaning of these words deserves a little discussion. To the writer they 
do not mean, “‘so that an entirely nonexistent effect will be called real 
once in twenty times, or once in a hundred times’, but rather that 
“with the same sort of protection against false positives that I usually 
have when I make tests of significance on hypotheses suggested by the 
results tested, successive tests of hypotheses, tests of regression on 
selected variables, etc.” For these reasons, working “‘at the 5% level” 
may involve the successive use of tests, each of which yields false posi- 
tives five times in a hundred, but, when used together, will yield seven, 
eight or nine false positives in a hundred. It is such a primitive and 
rough standard that we wish to combine with a primitively and roughly 
outlined desire to detect effects which are really there. From these 
primitive desires we are to seek a method. 
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3. THE STIGMATA OF DIFFERENCE 


When the real differences between variety means are large, how do we 
realize this fact? Three vague c.iteria come naturally to mind: 


(1) There is an unduly wide gap between adjacent variety means 
when arranged in order of size, 

(2) One variety mean struggles too much from the grand mean, 

(3) The variety means taken together are too variable. 


It is these three criteria we are going to apply in order to break up an 
observed set of means. We need, then quantitative tests for detecting 
(1) excessive gaps, (2) stragglers, (3) excess variability. These must be 
used when the variance of an individual observed mean is not known 
exactly, but rather when it is estimated from some other line of an 
analysis of variance table. The tests which we use must therefore be 
Studentized tests. Exact tests for (2) and (3) are available, but for the 
present we shall confine ourselves to an approximate and conservative 
test for (1). 

If there are only two variety means, the largest gap between adjacent 
means is the same as the absolute value of the difference of the means. 
If m, > mz , and s,, is the estimated variance of a single mean, then 


My, =? Meo 
1/2 
ee 


has one-half of a ¢-distribution and assuming normality, exceeds 2.447 
only 5% of the time when the two true means are equal and s,, is based 
on 6 degrees of freedom. There are good reasons based on experimental 
sampling (Section 9) and numerical integration (Section 8) to believe 
that the one-sided 5%, 2%, 1% points of 


largest gap between adjacent means 
Soe 


are smaller than the corresponding two-sided percentage points of ¢. 
If this is true we will be conservative to use this ratio and the two-sided 
percentage points of ¢ as a test of excessive gapping. The reasons are 
discussed in a later section. 
The exact test of 
HE) 
Sin 
where m, is the largest-mean and 7m is the grand mean has been discussed 
for the case of normality by K. R. Nair [4] in a very recent numbei of 
Biometrika. Simple and satisfactory empirical approximation to the 
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upper percentage points (between 10% and 0.1%) can be obtained by 
treating 


Sin 
3(4 a (k > 3 means) 
4 on 
or 
(228) 
os 2 
3(4 ait ) (3 means) 
ee 


as unit normal deviates, where s,, is based on n degrees of freedom. The 
adequacy of this approximation—which avoids the use of multiple entry 
tables—is also discussed in Section 6. 

The exact test of excessive spread in general will of course be the 
familiar F-test (or z-test). 

We propose to use these tests successively, and in the following order 
and manner. First, apply the gap test to break up the means into one 
or more broad groups. Second, apply the straggler test within these 
groups to further break off stragglers within groups. Third, apply the 
F-test to these new subgroups to detect excess variability. It is hard 
to see how to find the frequency of false positives with the whole system 
analytically, but the writer conjectures that, if the same level, such as 
5%, is used in all three tests, the frequency of false positives will be 
between 1.2 and 1.6 times the level used (i.e., between 6% and 8% 
when a 5% level is used). This is about where the frequency of false 
positives stands for many repeated and result-guided tests of significance 
now in actual practice. 


4. DETAILED PROCEDURE ILLUSTRATED BY EXAMPLES 


The two examples we are going to use are those discussed by 
Newman [5] in connection with the use of the Studentized range. The 
advantages of continuing with the same examples may compensate for 
disadvantages of lack of simplicity, and in the case of the first example, 
lack of appropriateness. This first example is a 6 X 6 Latin square 
with potatoes, cited by Fisher [1] in Article 36 of The Design of Experi- 
ments. As first presented this example is stated to be six fertilizer treat- 
ments in a Latin Square, and Newman seems to have based his example 
on this discussion. Later on in the book (Article 64), Fisher points 
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out that these treatments were a 2 X 3 factorial design in nitrogen and 
phosphorus, so that there were specific individual degrees of freedom 
whose analysis was planned when the experiment was designed. These 
were not 6 treatments all on an equal footing, and overall analysis is not 
appropriate, but we shall proceed to analyze them as if they were six 
treatments about which there is no advance information. The six means 
were (A) 345.0, (B) 426.5, (C) 477.8, (D) 405.2, (E) 520.2, (F) 601.8, 
and the estimated standard deviation of a mean was s,, = 15.95. 

Step 1. Choose a level of significance. For this example we shall choose 

5%. 

Step 2. Calculate the difference which would have been significant if 

there were but two varieties. 


The two-sided 5% point of ¢ on' 20 degrees of freedom is 2.086. For 
this example, then, this least significant difference is 2.086 (2"””)15.95 = 
47.0. 


Step 3. Arrange the means in order and consider any gap longer than 
the value found in Step 2 as a group boundary. 


Arranged in order, the means are 345.0, 405.2, 426.5, 477.8, 520.2, 
601.8 and the differences 405.2 — 345.0 = 60.2, 477.8 — 426.5 = 51.3, 
and 601.8 — 520.2 = 81.6 exceed 45.7, so that we have divided the 
varieties into four groups: 345.0 (A) by itself, 405.2 (D) and 426.5 (B) 
together, 477.8 (C) and 520.2 (£) together, and 601.8 (F) by itself. 

If no group contains more than two means, the process terminates. 
The first example having terminated, we must pass to another to illus- 
trate the continuance of the process. Snedecor [6] gives as Example 
11.28 on p. 274 (of the 4th edition) the results of a 7 * 7 Latin Square 
with potatoes. The means were (A) 341.9, (B) 363.1, (C) 360.5, (D) 
360.4, (EZ) 379.9, (F) 386.3, (@) 387.1 and s,, on 30 degrees of freedom 
was 9.52. Choosing the 5% level, for which ¢ on 30 degrees of freedom 
is 2.042, we find ¢(2”””)s,, = 27.5. In order, the means are 341.9, 360.4, 
360.6, 363.1, 379.9, 386.3, and 387.1 No difference between adjacent 
means exceed 27.5, so that there is only one group at the end of Step 3. 


Step 4. In each group of 3 or more means find the grand mean, the most 
straggling mean and the difference of these two divided by s,, . Convert 
these ratios into approximate unit normal deviates by finding 
ae 2 logio k 


Pike eemen Mle si (k > 3 means in the group), 


iopaet , 
a(j +4) 


104 BIOMETRICS, JUNE 1949 


ere (3 means in the group). 
ag) 
a(] a n 
Separate off any straggling mean for which this is significant at the 
chosen two-sided significance level for the normal. 


For the Snedecor example we find m = 368.5, and the most straggling 
mean is m = 341.9. The ratio is 26.6/9.51 = 2.80. Further log: 7 = 
.845 and we are to consider 


2.80. — : ae 
= (2:80) 10192210, 
é 1 51 
BG Be 30 


Since the two-sided 5% level for the unit normal is well known to be 
1.96, we must separate 341.9 (A). 


Step 5. If Step 4 changed any group, repeat the process until no further 
means are separated in the old groups. The means separated off from 
one side of a group form a subgroup. If there are any subgroups of 
three or more when no more means are being separated from groups, 
apply the same process (Steps 4 and 5) to the subgroups. 


The old group in the Snedecor example now contains 6 means, and its 
grand mean has increased to m = 372.9. The most straggling mean is 
387.1 for which (387.1 — 372.9)/9.51 = 1.49. The approximate unit 
normal deviate is 60/51 (1.49 — 0.93) = 0.66, which is far from signifi- 
cance. Step 5 has produced no further effect. 


Step 6. Calculate the sum of squares of deviations from the group mean 
and the corresponding mean square for each growp of or subgroup 3 or 
more resulting from Step 6. Using s,, as the denominator, calculate 
the variance ratios and apply the F-test. 


In the Snedecor example, we have one group of six, for which the sum 
of squares of deviations is 829 and the mean square 166. The denomi- 
nator is (9.51)” = 90.4 and the F-ratio 1.83 on 4 and 30 degrees of free- 
dom, which is near the 12% point. Thus there is no overall evidence of 
difference in yield for these six varieties. 

If varieties (B) 363.1, (C) 360.6, and (D) 360.4 had been known in 
advance to be different as a class from varieties (Z) 379.9, (F) 386.3, and 
(G@) 387.1, it would be fair to introduce a single degree of freedom for this 
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comparison, giving an analysis of variance (in terms of means) like this. 


Degrees of Mean 
Freedom Square 
BCD vs EFG 1 794 
Varieties within classes 4 Som 
Error 30 90.4 


From this we could conclude that BCD and EFG were different, even at 
the 1% level. There is no valid basis for this particular conclusion unless 
the classes are wniquely known in advance of the experiment. (There 
are 20 ways to split six varieties into two classes of three varieties each, 
so that the apparent significance of the most significant split would be 
expected to be at a percentage level near 1/20th of the percentage level 
of the whole group. The actual figures are, approximately, 0.6% and 
12% and their agreement with the 1-to-20 ratio is unusually close.) 

In the Fisher example, the proposed procedure gave the following 
result: Variety A (345.0) is significantly lower than varieties D (405.2) 
and B (426.5), these in turn are significantly lower than C (477.8) and E 
(520.2), and in turn these are significantly lower than F (601.8). All 
significance statements are statistical, and are at the 5% level or better. 

In the Snedecor example, the proposed procedure gave the following 
result: Variety A (341.9) was significantly lower than some of the varieties C 
(360.1), D (360.4), B (863.1), EH (379.9), F (386.3), and G (387.1) at the 
5% level or better, the group of 6 varieties showed no overall evidence of 
internal differences at the 5% level. 

These conclusions should be compared with those of Newman, who 
used the Studentized range to conclude in the first case that even taking 
ADB and CEF as two groups, neither was homogeneous. This is con- 
sistent with the result of the present analysis, but far less detailed. For 
the Snedecor example, Newman found that if either A or F and G 
together were made a separate group, the remainder seemed homogene- 
ous. This is again consistent, but less detailed, since the present process 
finds definite reason to suppose that it is A which is inhomogeneous. 
(How much stronger is the evidence we have against A than against F 
and G is another matter.) 

The writer feels that the proposed procedure is direct, reasonably 
simple, involves no new tables, and is ready to be used in practice and 


thereby put to the ultimate test. 
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5. THE DISTRIBUTION OF THE MAXIMUM GAP 
We are interested in the following problem: 


“Tet a sample of k values (in our case means) be drawn from a normal 
distribution, of which we know only an independent estimate s of its 
standard deviation, based on n degrees of freedom. What is the distribu- 
tion of 


The methods of Hartley, reviewed in detail by Nair [4], would allow us 
to solve this problem for finite n if we knew the answer for infinite n, 
that is for the case where we know ao, the standard deviation of the 
normal population. 

The problem of the distribution of the largest gap in a sample of k 
values from a unit normal distribution can easily be attacked by experi- 
mental sampling (see Section 9). The fact that the random normal 
deviates of Mahalanobis [3] are printed in blocks of five leads one to 
study k = 5 and k = 10 first. The first 1000 blocks of five in that table 
were used (skipping block 768, which was marked as an error in the copy 
available to the author). 

The results are shown below: 


TABLE 1 


UPPER PERCENTAGE POINTS OF THE LARGEST GAP IN AN 
ORDERED SAMPLE OF k FROM A UNIT NORMAL 


kK =55 k = 10 

% ip = sample of sample of 

theory 1000 cases 500 cases 
10 2.33 iS <1.50 
5 Qed, 2.13 1.68 
2 3.29 2.49 1.95 
il 3.64 PI TELE 2.42 


The theoretical values for k = 2" are values of t(2'/”) and are accurate, 
the others are as found by experimental sampling and may deviate from 
accuracy by perhaps 1 or 2 in the first decimal. They are sufficiently 
accurate, however, to indicate that the upper percentage point decreases 
as k increases. Thus if we use the values for k = 2 we will make a 
conservative test. This is true for n = «, and by the nature of Hartley’s 
expansion it will continue to hold for all reasonable values of n. 
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TABLE 2 
QUALITY OF APPROXIMATION OF PERCENTAGE POINTS FOR THE STRAGGLER TEST 


Normal percentage point 
minus accurate Occurs for Cases 
percentage point 


1%, 8 or 9 means, n = 20 


0.15to 0.20 3 means, n < 15 6 
(5%, 3 or 4 means, n < 24 
0.10to 0.15 1%, 4 means, n < 11 33 
1%, 3 means, n < 30 
(5%, 5 means, n < 24 
0.05 to 0.10 5%, 3 or 4 means, n < 60 21 
|1% 4 means, n < 11 
1%, 3 means, n < 120 
—0.05 to +0.05 otherwise 154 
10%, all cases 
—0.10 to —0.05 5%, 9 means, n = 10, 11 20 


The discussion in Section 2 suggests, of course, that it would be cor- 
rect and wise to find accurately the percentage points of the largest gap 
for various values of k and then use the appropriate values of k. This is 
not being suggested for the present, because: 


(1) the necessary table does not exist, 

(2) it would complicate the procedure, 

(3) there are problems in choosing the appropriate value of k, 

(4) the simpler proposed procedure has not yet been used enough to 
show its characteristics. 


6. THE STUDENTIZED EXTREME DEVIATE 


In his recent paper, Nair [3] has given the following upper percentage 
points for 3 to 9 samples: (A) the 10%, 5%, 2.5%, 1%, 0.5% points 
for n =~, (B) the 5% points for n from 10 to 20 and 24, 30, 40, 60, 120, 
o, (C) the 1% points for the same values of n. The accuracy of our 
rough approximation is most easily considered by transforming them 
into percentage points for the approximate unit normal deviates—these 
are what should be used for accuracy,—and comparing these with the 
percentage points of the normal—these are what we propose to use. 
Such a comparison has the following results, (Table 2). 
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Thus for about two-thirds of the cases tabulated by Nair, the error is less 
than 0.05, and is surely negligible in practice. : 

In doubtful cases, a more precise approximate test may be made as 
follows. Let 


i 
Sn 


(ire) a caer 


as a unit normal deviate and multiply the tail area by k if only one kind of 
straggler (high or low) could be considered, and by 2k otherwise. Thus 
ifm = 52,m = 48,s = 4,k = 18, n = 28 


252 52 
neni 2 ary 


(m an extreme mean) 


Then treat 


= 2.25, 


ii 0g 5) ts - im 

(38) (2.25 Sac. 5 tee 1.041(2.25 — 0.13), = 2.20 

Now the probability of a unit normal deviate = 2.14 is 0.01390 (from 
any normal table, e.g. Fisher and Yates [2] Table IX where 98.610% 
corresponds to a probit of 7.1200). Multiplying by 11 gives 15.38% as 
the approximate significance, if only low means are of interest, while the 
level is 30.6% when either high or low means are involved. 

This approximation is discussed by Nair [4] for the case n = ~ , where 
it is due to McKay. Nair shows that it is very good indeed. The effec- 
tiveness of the term in n * may be tested by calculating the true per- 
centage points for w — 3n7'(w — 1.2) from Nair’s tables. 


TABLE 3 
UPPER PERCENTAGE POINTS FOR w — 10/3n (w — 1.2) 


5% points 1% points 
n ip = 8 5 7 9 ih = 8 5 ii 9 
10 17 2.06 2.24 2.35 2.24 DSi Drills 2.85 
15 1.76 2.08 2.26 239 2.27 2.62 2.81 2.93 
20 LW 2.08 DN 2.39 220 2.62 2.82 2.92 
30 evo 2.09 DOG 2.40 2,25 2.61 2.82 2.93 
ro) 1.74 2.08 oath 2.39 E22 2) Bil 216 2.88 
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The errors involved in the use of the values at the bottom of the columns 
of Table 3 instead of those above them can hardly ever be of practical 
importance. 

The previous approximation is recommended for routine work since 
it involves less computation and no changing of significance levels. Both 
approximations are only good for upper percentage points in the signifi- 
cance test range. The latter approximation should meet all practical 
needs. 

The writer would rarely bother with the more precise approximation 
except possibly for the cases where the error of the rough test is between 
—0.10 and —0.05. The original experimental values are likely to be 
somewhat non-normal with large tails. An accurate allowance for this 
would be hard to compute, but it would increase the accurate percentage 
point slightly, more for smaller n. The rough approximation tends to 
compensate for this fact in most cases. 


7, THE DISTRIBUTION OF LONG GAPS IN A SAMPLE OF k FORM 
ANY POPULATION 
While we could concern ourselves with the distribution of the longest 
gap, the next longest gap, and so on, it seems theoretically better and 
practically simpler to do something somewhat different. We are going 
to calculate the expected number of gaps longer than a length G, which 
we denote by p,. For the sort of test considered above, there is much 
reason to use p,. For 7p, is the fraction of gaps per sample which will be 
falsely judged significant. If it is as bad to find two false gaps in a sam- 
ple as to find one false gap in each of two samples, then we’ should 
consider p, . 
Now we shall take the definition of a gap starting at y to be that y is © 
the left hand of the gap. If y is the left-hand end of a gap of length at 
least G, we have the following table of elementary probabilities: 


Event Probability 


One observation must fall between y and y + dy k dF (y) 


k — 1 observations must fall between — ~ and y 
or between y + Gand + © WNC ee load Mie ON feo 


Not all & — 1 observations can fall between — ~ 
ey =O 


eS eee 
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hence 
p=kf{ (RG) +1- Fy t+ Or — HQ) AF). 


8. THE SYMMETRICAL CASE 


If the distribution of x is symmetrical about zero, we may count only 
the gaps with centers to the left of the origin and then double. The 
expression for p, follows from: 


Event Probability 


One observation must fall between y and y + dy k dF(y) 


k — 1 observations must fall between — ~ and y 
ory + Gand +2 @#y) +1—F@+G@))= 


Not all k — 1 observations can fall between — © 
and y or —y and + —(2F(y))*! 


Since y < —4G, and since the result is to be doubled, we have 


-1¢ 
p= ok f (Py) +1 - Fy + © — @FQ)} dF) 


Making the substitutions u = F(y), h(u) = F(y) + 1 — Fy + G), this 


becomes 
F(-3G) G\\* 
pi = 2k / he du — {2"(- ) 


For reasonably large G, the second term is fairly small and we can get 
an accurate value of p, with a reasonable amount of labor. 

As an example, let us take the unit normal distribution and G = 2. 
Since h(u) is non-analytic near 1 and has a minimum at F(—1) = .1587, 
it is natural to break the integral up into parts as follows: 


0004 


*0004 +004 +04 
me ak | he odes ak | ne du + 2k i ne du 
0 004 


16 
+ Qh i __ He" du — 0.0013(2h)(h(.1587))'"" — (3174)! 
“04 


Calculating h to four decimals, applying Simpson’s rule to the range 
from 0 to .004, and the corresponding six-panel rule to the other three 
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ranges yields the following results, where the terms are given in the order 
of the formula above: 


k + + > + ~ - ps 

2 .00151 .01168 .07875 .16781 .00165 .10074| . 15736 
3 .00214 .01426 .06603  .08885 00079 .03206 |. 13843 
4 .00270 .01553 .04227 .04224 00032 .01014 .09228 
5 .00320  .01590 .03035 .01908 .00013 .00322 | 06518 
6 .00363 .01567 .02635 .00835 00005 .00102 | 05293 
7 .00402 .01508 .01880 .00353 00002 .00032 | 04109 
8 .00436 01426 01342 .00154 .00001 .00010 | .03347 
9 .00468 01330 00959 00065 .00000 .00003 | .02819 
10 .00493 01230 00691 .00029 .00000 .00000 | .02443 


The value for k = 2 can of course be calculated directly as 
2(1 — F(2’’”)) = 2(.0787) =°.1574 


The results are probably accurate to 1 or 2 in the fourth place. They 
can be conveniently stated as in the following table: 


TABLE 4 


NUMBER OF GAPS LONGER THAN 2.00 EXPECTED PER 100 SAMPLES OF k FROM THE 
UNIT NORMAL 


gaps 
15.74 13.84 9.23 6.52 5.29 4.11 (3535 2°82) 2.44 


100 samples 


9. RESULTS OF EXPERIMENTAL SAMPLING 


The results of the experimental sampling of 1000 sets of 5 from 
Mahalanobis’ approximation to the unit normal are given in the follow- 
ing table, (Table 5). 

The approximate normality of (largest gap) “" in this sample, as indi- 
cated by the correspondence of the last two columns between the 2% 
points is striking. For comparison it seemed worthwhile to examine the 
normality of (largest gap)’”” for k = 2, where the probability of a 
gap > Gis 2N(G/2), where N(w) is the unit normal cumulative. This 
gives the following results, (Table 6). 


1/2 


112 BIOMETRICS, JUNE 1949 


TABLE 5 


RESULTS OF EXPERIMENTAL SAMPLING. DISTRIBUTION OF LARGEST GAPS IN 
1000 SAMPLES OF 5 


(gap)1/2 — 1.07 
Cell Number Cumu. Equiv. 
Norm. Dev. .23 
.185- .199 2 2 —2.88 (—2.70) 
.200— .299 9 11 —2.29 (—2.26) 
.300-— .399 20 31 —1.87 —1.90 
.400- .499 28 59 —1.56 —1.57 
.500- .699 97 156 —1.01 —1.00 
.700— .899 141 297 —0.53 —0.52 
.900-1 .099 172 469 — .08 — .09 
1.100-1.299 149 618 0.30 0.30 
1.300-1.499 126 744 0.66 0.68 
1.500-1.699 110 854 1.05 1.00 
1.700-1.899 56 910 1.34 ss y 
1.900-2 .099 36 946 1.61 1.64 
2. 100-2 .299 24 970 1.88 1.90 
2. 300-2. 499 11 981 2.07 Da, 
2. 500-2 .699 8 989 229 (25511) 
2. 700-2 .899 4 993 2.46 (2a) 
2.900-3 .099 4 997 2.15 (2.99) 
3. 100-3 . 299 2 999 3.09 (3.20) 
4.000-4.099 1 1000 © 


Here the fit is good between the 10% points. This suggests that the 
(largest gap)'’” may be a convenient interpolation variable. 

The number of cases > 2.00 actually found was 68, while the number 
to be expected according to the last section was 65.2 less an allowance 
for large double gaps which might amount to one unit. Finding 68 
instead of 64 is a deviation of 0.50, and is highly reasonable. 

For k = 10, the count was only made for gaps > 1.5, with the follow- 
ing results, (Table 7). 

The fit here is reasonably good out to the 5% point. Since theory 
predicts about 12.2 beyond 2.00 instead of 9 observed, there is no serious 
disagreement here. 

If we want to make real use of this (gap)’”” variable, we may use the 
known percentages beyond 1.414, found for k between 2 and 10 in the 
last section to fix lines in the plane of the mean and standard deviation 
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TABLE 6 
CUMULATIVE FOR (LARGEST GAP)1/2 IN SAMPLES OF 2 FROM THE UNIT NORMAL 


(gap)1/? — .98 
iG gap (gap) !/2 Buy \oOn —_—————————— 
| Deviate .44 
1 .0177 . 1384 —2.33 (—1.95) 

2 .0357 .189 —2.05 (—1.80) 

5 .O891 .299 —1.64 (—1.55) 
10 .1781 .423 —1.28 —1.26 
20 . 360 . 600 —0.84 —0.86 
50 . 960 .980 0.00 0.00 
80 1.825 1353 0.84 0.85 
90 2.300 1.536 1.28 1.26 
95 2.794 1.672 1.64 ae) 
98 3.308 1.821 2.05 (1.91) 
99 3.650 1.914 Deo (2.12) 

TABLE 7 


RESULTS OF EXPERIMENTAL SAMPLING 
DISTRIBUTION OF LARGEST GAPS IN 500 SAMPLES OF 10 


Equiv. (gap)!/2 — 0.85 
Cell Number Cumul. Norm. =—————_ 
Deviate 24 
-1.499 454 454 ieoe) 1.38 
1.500-1.599 15 469 Io: ih tage} 
1.600-1.699 9 478 1b 1.68 
1. 700-1 .799 9 487 1.94 1.82 
1.800-1.899 2 489 2.01 1.98 
1.900-1.999 2 491 2.10 2.08 
2.000-2.199 1 492 2.14 (2.39) 
2.200-2.399 2 494 2.26 (2.48) 
2. 400-2 .599 3 497 Dao (2.93) 
2.600-2.799 2, 499 2.88 (3.13) 
3.100-3. 199 1 500 c) 


of the approximation. A little bold, dashing, freehand, two-dimensional 
interpolation produces the following results: 
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TABLE 8 
TENTATIVE BEHAVIOR OF (LARGEST GAP)1/2 FOR SAMPLES OF k FROM THE UNIT 
NORMAL 
Parameters Levels for (gap)!/? Levels for gap 
k m 8 5% 2.5% 1% 5% WLS, 1% 
2 0.98 0.43 1.69 1.82 1.98 2.8 ome 3.9 
3 1.03 0.36 1.62 1.74 IL ty 226 3.0 3.5 
4 1.06 On27 1.50 1.59 1.69 2.3 2.5 2.8 
§ 1.06 0.23 1.43 i tal 1:60 AW) BF 2.6 
6 1.06 0.22 1.42 1.49 il. S7/ 220 22 Day 
i 1.04 0.21 1.39 1.45 1.53 1.9 Dre 2S 
8 LO ©. hil Be 1.48 ales 1.9 2.0 28 
9 1.00 (0) Pa 3D 1.41 1.49 1.8 2.0 222, 
10 0.99 0.22 1, Oe 1.40 1.48 1.8 2.0) ZZ 


By a stroke of luck, the levels for the gap itself might be accurate to one 
or two tenths. These are, of course, unstudentized levels. 
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METHODS OF ESTIMATING TOTAL RUNS 
AND ESCAPEMENTS OF SALMON 


GrorcEe A. RoUNSEFELL 


Chief, Atlantic Salmon Investigations 
Branch of Fishery Biology 
Fish and Wildlife Service 
U.S. Department of the Interior 


In creating the management programs necessary for the con- 
servation of salmon fisheries, special attention should be given to 
salmon investigations of the past to profit by their successes and 
failures. Investigations of the Fraser River sockeye salmon 
afford excellent illustrative material for developing management 
procedures. This fishery was studied by several famous biologists 
of the past, including Edward E. Prince, Richard Rathbun, 
David Starr Jordan and Charles Henry Gilbert. They have been 
followed by Wilbert A. Clemens, Lucy 8. Clemens, Henry O’Mal- 
ley, Willis H. Rich, R. Earle Foerster, William E. Ricker, George 
A. Rounsefell, George B. Kelez and Will F. Thompson, all of 
whom have published on the fishery. These Fraser River investi- 
gations have been pursued over such a long period as to make their 
study of peculiar importance in the evaluation of methods. 
Having personally collected, analyzed and published statistical 
data on gear, catches and abundance up to 1934, I have drawn 
freely upon them. 


bie FIRST STEP in any quantitative biological investigation is to 
delimit the population being studied. Salmon catches often represent 
mixed populations as the salmon bound for any particular river may 
traverse numerous waterways before reaching the river mouth, and be 
taken along with salmon bound for other rivers. Extensive marking 
and tagging experiments have partially delimited the populations for 
many Alaskan rivers but much remains to be done, especially for the 
great red-salmon streams in Bristol Bay. The Fraser River sockeye 
run has been so dominant in its area that this problem has not assumed 
as great importance although there is still insufficient knowledge as to 
what portion of the run enters the Gulf of Georgia from the north. 
Tagging experiments by Clemens showed that sockeyes using this north- 
ern route were bound chiefly for the Fraser River; and there is some 
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evidence that a larger proportion enter the Gulf north of Vancouver 
Island during the warmer years. 

Once the population has been circumscribed, the problem is to de- 
termine within reasonable limits what factors may be responsible for 
annual variations in its size. The methods developed by Baranov (1918) 
and greatly improved by Ricker (1940, 1944) for the determination of the 
abundance of populations of marine fishes are not applicable, because 
any particular salmon is not subject to capture throughout the season 
but only during that part of the season during which it runs the gauntlet 
of the gear; and because the total population dies after spawning and 
is therefore available during but one season. 

For wise management it is desirable to know within reasonable limits 
the actual numbers of fish both in the catch and in the escapement 
through the fishery toward the spawning grounds. In some of the smaller 
salmon rivers it has been possible to erect weirs and count the number of 
salmon escaping through the fishery, and in a few of the larger rivers 

‘they are counted while ascending fishways. This escapement, when 
added to the total catch attributed to fish spawned in the same river, 
gives the total run of the season. Such counts have been made since 
1921 at Karluk River, Alaska, and since 1935 at Bonneville Dam, on the 
Columbia River, but because such weir counts have proved impractical 
in most larger rivets, recourse must be had to other methods. The Fish 
and Wildlife Service, the Fisheries Research Board of Canada, and the 
International Pacific Salmon Fisheries Commission, have marked and 
released adult salmon near the mouth of a river to determine the number 
of spawners from the proportionate numbers of marked and unmarked 
fish found on the spawning grounds. The Fish and Wildlife Service 
is also perfecting the use of aerial photography to cover the vast lake 
and river systems in Bristol Bay. However, such methods do not give 
us an insight into the past, and it will take many years to furnish a long 
enough series to show what variations to expect. 

For the Fraser River, Rounsefell and Kelez (1938), developed indices 
of relative abundance based on the catch-per-unit of fishing effort by 
standardized amounts of fishing gear, but all such indices fail to show 
the total population and are therefore somewhat limited in their appli- 
cation. 

Using the data given by Rounsefell and Kelez (1938) an estimate has 
been developed of the total salmon run of each season to the Fraser 
River from 1894 to 1945. This method, which may find application in 
other salmon investigations, has been adapted from a method developed 
by Professor D. B. DeLury (1947) of the Ontario Research Foundation. 

The notation used by Professor DeLury is as follows: 
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= Catch per unit of fishing effort 
Number in population at any time (é) 
Number of units of fishing effort 

= k log , e in which fk is a constant 


I 


I 


C 
N 
E 
b 
He has shown that 

(1) log C@® = log (k N,) — b E(t) 

This is equivalent to a linear equation in which 


(2) logy = log a — bz 


By determining the equation at C, the total population equals k No/k 
before fishing commences. - 

In a closed population the total population at the commencement of 
fishing may be determined by periodically plotting log C against the 
accumulated effort, #, throughout the season and then fitting a linear 
regression to the data in order to determine the slope, b, and the inter- 
cept, log (k N,). 

In adapting this method to the salmon data there is available for 
most years only the average catch per unit of gillnet effort for each year. 
Since the same population was not present each year it was necessary to 
determine the relationship between the catch per unit of fishing effort 
and the number of units of effort by a multiple covariance analysis, 
using the data of Rounsefell and Kelez for the 39 years from 1896 
through 1934. Three variables were used: 1) The index of abundance 
from traps (expressed in logarithms). This I regard as the best measure 
of the runs before they reach the river. 2) The number of gillnets 
fished. 3) The catch per gillnet (expressed in logarithms). Figure 1 
shows the regression of the logarithm of catch per gillnet on number of 
gillnets when the abundance is held constant. The effect of cyclic 
differences was removed by a covariance analysis (see Table 1). 

From this regression the catch per unit of effort at 0 units is easily 
calculated for each season. The notation used is as follows: 


Y = log of catch per unit of gillnet effort 

X, = log of trap index of abundance 

X, = number of gillnet units of fishing effort 

C’ = antilog of Y, or estimated catch per unit of gillnet 
effort 

(0) and (p) = number of gillnet units in X, at 0 nets and at 
any particular number of nets (p) 

%, = mean of X, 

C = observed catch per unit of gillnet effort 


I 


I 
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C%,(0)(%,) = calculated catch per gillnet unit when X, 
equals 0, with X, at its mean 2, 

C%,(p)(:) = calculated catch per gillnet unit when X, 
equals p, with X, at its mean 2, 

Cx,(p) = observed catch per gillnet unit when X, equals p 
for any given season with p number of nets 

C&,(0) = unknown catch per gillnet unit when X, equals 
0, for any given season 


2.8 


2.7 


2.6 


2.5 


2.4 


LOG. OF GILLNET CATCHES 


2.3 
0 


“J 


2 3 4 oa 
UNITS OF GILLNET EFFORT (10008) 
FIGURE 1 


The regression of the logarithm of the catch per gillnet on number of gillnets when the abundance is 
held constant. 


From the multiple regression formula: Y = 0.953100 + 0.940834 X, 
— 0.000529 X, , the estimate of Y with X, at Z, and X, at 0 is 2.733, the 
antilog Ck,(0)(4) = 541. Similarly, it is easy to obtain C£,(p)(Z,) by 
substituting the units of effort, p, for any given year for X, in the for- 
mula with X, at z,. The observed catch per gillnet unit of effort for 
the season Cx,(p) is already known. The catch per unit of effort for 
the given season at 0 units of effort, C%,(0), is then easily obtained by 
simple proportion: 


Ce(OG) — OF (0) 
Cx,(p)(@:) — Cx.(p) 
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TABLE 1 


MULTIPLE REGRESSION OF THE LOGARITHM OF THE CATCH PER GILLNET UNIT OF 

EFFORT, Y, ON THE LOGARITHM OF THE TRAP INDEX OF ABUNDANCE, X1 AND 

OF THE NUMBER OF UNITS OF GILLNET FISHING EFFORT, X2, (DATA FROM ROUNSE- 

FELL AND KELEZ, 1938), FROM 1896 TO 1934, INCLUSIVE (NOTATION FOLLOWS 
SNEDECOR, 1946) 


n = 39 GY = 2.5558 E; = 1.8920 F_ = 334.9744 No. cycles = 4 
Sums of squares and products 

Source of 

variation D.F Mie S22? Sy? Sx22 Sry Sxoy 
Total 38 | 6,028,285 | 631,507 | 5.376020 | 1,269.551 | 4.771096 | 854.829 
Cycles 3 | 1,577,233 | 26,227 | 1.097768 | 188.646] 1.144999 | 152.229 
Within cycles ; 

(error) 39 | 4,451,233 | 605,280 | 4.278252 | 1,080.905 | 3.626097 | 702.600 
Correlation 

coefficients Ty.2, = 0.83097 ry.2, = 0.48664 72,2, = 0.65869 R = 0.84887 
Regression 

coefficients b’y,., = 0.95963 b’y,., = —0.19536 
Regression y: 

oray, Y = 0.953100 + 0.940834X, — 0.000529.X, 
Standard error 

of estimate | Sy.12 = 0.19319 


Once C%,(0) had been obtained for any particular season, it could 
then be divided by k, 0.000529/.434, or 0.00122 to give the total number 
of fish reaching the river. However, although the estimates of the 
population so derived are correct in relative size to one another they do 
not yield the true population because the slope within years is not the 
same as the slope, 6, between years, which is the only slope calculable 
from the available data. 

A close approximation to the true populations can, however, be 
obtained by determining the relation between the calculated populations 
and estimates of the total population in one or more years. This was 
done by comparing the calculated estimates for the five years from 1941 
through 1945 with the estimate of the total run to the river derived by 
adding the escapement estimated by the International Pacific Salmon 
Fisheries Commission to the gillnet catch on the river. The correlation, 
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TABLE 2 


CALCULATED ESCAPEMENTS AND TOTAL RUNS IN THOUSANDS FOR THE FRASER 
RIVER SOCKEYE SALMON! 


CycLe A 


CrcLe B 


Total Percent 
Year Escapement run? escapement 


Total Percent 
Year Escapement run escapement 


1894 3431 UTAS: 44.5 1895 3507 8658 40.5 
1898 767 5837 13.1 1899 1431 12799 ple 
1902 1214 8393 i485) 1903 610 4863 12.5 
1906 1251 5348 23.4 1907 418 2140 "19.5 
1910 1157 5613 20.6 1911 658 2837 2oae2 
1914 689 6382 10.8 1915 347 Zine 16.0 
1918 35 946 4S 1919 318 1567 20.3 
1922 456 1550 29.4 1923 442 1299 34.0 
1926 1268 2650 47.8 1927 804 2587 Sil. il 
1930 944 5532 alien 1931 502 1936 25.9 
1934 972 5992 16.2 1935 709 2119 Bon0 
1938 PAE 5026 PS At 1939 364 1457 25.0 
1942 2397 10682 yp) ih 19438 180 773 SS 
CrciE C CycLE D 
1896 1196 5494 21.8 1897 4629 19051 24.3 
1900 374 4760 7.9 1901 DRGY 28132 8.4 
1904 389 2728 14483 1905 3476 24157 14.4 
1908 676 3426 NOR 1909 1636 22562 ee 
1912 1094 A457 DATO 1913 7157 38500 18.6 
1916 117 1403 8.3 1917 435 7318 5.9 
1920 431 1641 26.3 1921 354 2040 ee: 
1924 549 1763 Silent 1925 619 2448 2503 
1928 314 1256 25.0 1929 617 2676 Dom 
1932 678 2265 29.9 1933 469 2919 16.1 
1936 2030 4763 42.6 1937 593 2189 Pi th 
1940 725 Pia 28.4 1941 1392 4631 30.1 
1944 542 2050 26.4 1945 454 2048 22m?) 


1Subsequent to 1934 the data on number of gillnet licenses published by the Dominion Fisheries 


Department have been made comparable to the gi 
(1938) by muitiplying the number of licenses by 1 
calculations. 


Inet units of effort given by Rounsefell and Kelez 


.375. These data were used in making the above 


2Escapement, plus gillnet catch, plus catch made before run reached the river. 


r, between the two estimates is 0.9933. From this comparison it was 
obvious that by multiplying the calculated populations by 4.2896 a 
close approximation of the true population could be obtained so this 


procedure was followed. 
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The calculated escapements and total runs are shown in Table 2. 
It will be noted that the escapement for the “big” year of 1909 appears 
very low, especially in comparison to 1913. This is probably due to the 
fact that a large part of the escapement in 1909 and prior “big” years, 
as pointed out by Rounsefell and Kelez (1938) was derived from a very 
late fall run that was not fished by the fishery and consequently would 
not appear in the calculations. Certainly, the rough estimates of escape- 
ment available for the earlier years of the fishery do not yield any serious 
estimates of actual numbers. Even the 4,000,000 fish reported to have 
passed into Quesnel Lake in 1909, on careful reading are only an estimate 
made by counting for a short period each day and then estimating 
_ according to the number of hours. There is no mention of what portion 
of the day was chosen for counting! 

The proportion that the escapement has formed of the total run has 
shown some interesting changes (see Table 2, and Figure 2). It may be 
noted that the proportion fell to the all-time low of 6 percent in the 
exceptionally intense fishery of 1917 during the first World War. After 
the partial obstruction of the upstream spawning migrations of 1913 and 
1914, the proportion rose due partially to stricter regulation and par- 
tially to lack of interest in the small runs. 

50 


w 


PERCENTAGE ESCAPEMENT 


og 1900 1910 1920 1930 1940 
FIGURE 2 
Showing the percentage that the escapement has formed of the total salmon run to the Fraser River, 
1894 to 1945. 

The runs of each of the 4-year age cycles are shown in Figure 3. 
The most striking feature is the decline of all but the “big” year cycle 
prior to the Hell’s Gate slide of 1913-14, showing that overfishing played 
a major role in the decline of the Fraser River runs. Even in 1913, 
enough fish passed the Gate to biing back a large run, larger than all 
but a few off years, in 1917. Had fishing been curbed in that year the 
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4.5 


eS) 


3.0 


1900 1910 1920 1930 1940 
FIGURE 3 
Showing the logarithms of the total runs by each age cycle and the trend of the geometric means 


smoothed by four. Datafrom Table2. Cycle A equals black dots, Cycle B equals open circles, Cycle C 
equals crossed circles, and Cycle D equals black triangles. 
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“big” year cycle might have continued to dominate. Instead the fishery 
was so intense that the escapement was proportionately the lowest on 
record. 

One major problem with two facets in the management of salmon 
runs is that of escapement. First, to what extent are variations in the 
size of the escapements associated with variations in the size of the runs? 
Second, what size of escapement will produce the largest surplus for the 
fishery out of the runs? 

In order to answer the first question the calculated escapements and 
total runs for the Fraser River from 1894 to 1945 have been employed 
(Table 2). The total run and the escapement four years previous, are 
thus available for 48 pairs of observations. 


LOGARITHM OF TOTAL RUN PRODUCED 


2.0 2.5 3.0 35 4.0 
LOGARITHM OF ESCAPEMENT 


FIGURE 4 


logarithm of the total run on the logarithm of the escapement four years 


Showing the regression of the 
One standard error of estimate shown by dotted lines. 


previous. 


The size of the escapements and the size of the resulting runs show 
a high correlation, 0.7071. The coefficient of determination, 0.50, indi- 
cates that 50 per cent of the variation in the run is due to variation in 
the escapement. The pairs of observations and the regression line are 
plotted in Figure 4. It is obvious that for the Fraser River at least, 
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the size of the escapement is by far the dominant factor in determining 
the size of the runs. 

However, there is a considerable range of variation in the returns 
from escapements of the same size which appears too large to be ac- 
counted for wholly by random vatiation. Foerster (1944) showed that 
predators can play an important role in the success of survival. Another 
likely source of variability is the great differences that exist in the dis- 
tribution of spawners amongst the several major spawning areas in 
different years. 

Thompson (1945) ascribed the major fluctuations in the success of 
reproduction in recent years to an obstruction in Hell’s Gate Canyon 
at certain water levels of the upstream migration of adults bound for 
their spawning grounds. 

This point can be tested by relating for the spawning years 1915 to 
1941 the residuals of Y from the linear regression line shown in Figure 5 
to the number of days given as passable at Hell’s Gate in the report of 
Thompson. The correlation coefficient is 0.387 and a probability of 
0.05 demands a coefficient of 0.381 with 27 degrees of freedom. The 
regression coefficient, b,., , is 0.00527, which when divided by its standard 
error 0.00251 yields a “‘t’” of 2.1 which may have statistical significance. 


+.4 


+3 


+.2 


RESIDUALS OF Y 


20 30 40 50 60 
DAYS PASSABLE AT HELL’S GATE 


FIGURE 5 


Showing the regression of the residuals of Y (See text and Figure 4) on the days passable at Hell’s Gate. 


Perhaps it should be pointed out that even though the linear correla- 
tion coefficient of 0.387 were highly significant, it would mean only that 
15 percent of the residual variability or only 7.5 percent of the total 
variability in the runs could be ascribed to the effects of water levels, 
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or of causes associated with water levels. Thus the data suggest a 7.5 
percent effect of water levels at Hell’s Gate on the success of spawning. 
However, since the runs above Hell’s Gate have been smaller in the 
years with water level data available, and the effect is obscured by the 
inclusion in the data of the runs to the areas below Hell’s Gate, it is 
possible that the effect is much larger than the data indicate. 

The data on returns from escapements (Figure 4) indicate that the 
survival rate of the progeny decreases as the size of the escapement 
increases. Therefore, the largest difference between the size of the es- 
capement and the number of returning salmon occurred when escape- 
ments were intermediate in size. This relationship has been repeatedly 
demonstrated in studies of population growth. However, it is not fully 
appreciated by the public in general. The clamor is for a return of the 
“good old days’’ when sockeye ascended the river in tremendous hordes. 
It is not generally realized that at very high population levels the 
efficiency of reproduction is so low that the major share of the run must 
spawn to maintain that level, leaving little surplus for the fishery. 

Under the conditions prevailing in the Fraser River watershed during 
the 52 years from 1894 to 1945, the variation in returns is so great that 
any prediction is extremely hazardous. All that can safely be said is 
that the largest number of sockeye will be available for the fishery when 
the population is maintained at some optimum intermediate level of 
abundance. 

The fact that there appears to be a maximum sustained harvest that 
could be taken under conditions prevailing during the past 52 years does 
not mean that there are no methods for increasing the harvest. Un- 
doubtedly the lower efficiency of reproduction associated with large 
escapements has resulted partially from the overseeding of some spawn- 
ing areas while others were underseeded. Regulation of the catch to 
permit larger proportionate escapements when the runs bound to under- 
seeded watersheds are passing through the fishery might yield larger 
returns from the same number of spawners. 

It should be borne in mind, however, that dominant cycles occurring 
every fourth year have been characteristic of the runs of sockeye to some 
of the lakes in the Fraser system. Further study will be necessary to 
determine whether it is desirable to seed these lakes every year. There 
is some possibility that such a procedtre might produce lesser yields 
than the maintenance of dominant four-year cycles, (not all occurring on 
the same year) in different lakes. 

Foerster (1944) made an excellent contribution in determining the 
effect of the control of predator and competing species in raising the 
survival rate of young sockeye in fresh water. As pointed out by 
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Rounsefell (1946) this method holds tremendous promise for the future. 

A third method for increasing the returns per spawner consists in 
providing passage for salmon into lakes and streams now barren because 
of permanent stream obstructions. This method is being extensively 
employed at present by the Fish and Wildlife Service and the Interna- 
tional Pacific Salmon Fisheries Commission. 

In summary, a method has been shown for estimating the total run 
and the escapement in a salmon fishery, thus providing basic data neces- 
sary for intelligent management. It has been shown for one great river, 
the Fraser, that the size of each year’s run is closely correlated with the 
number of spawners. It has also been shown that the total number of 
salmon that can be harvested from a run on a permanent basis cannot be 
increased beyond a certain point merely by increasing the number of 
spawners, unless either the spawners are so distributed over the water- 
shed as to make better use of the available areas, or the environment 
of the nursery areas is changed. 
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THE ANALYSIS OF EXTINCTION TIME DATA IN BIOASSAY 
K. MatHer 


Department of Genetics, University of Birmingham 


THE PROBLEM 


HE STATISTICAL analysis of data from bioassays depending on quantal 

responses has received much attention during the last twenty 
years and the necessary analytical principles are now well understood. 
In general a transformation is sought of the proportions of tests on 
individuals subjected to treatment, which will yield a linear relation 
between proportion responding, as so transformed, and the treatment 
as measured on some suitable scale. The regression line defining this 
relation is then calculated and is used to find the treatment corre- 
sponding to some standard proportion of response chosen as con- 
venient for purposes of comparison. Where the variation in treatment 
is a variation in dose administered, this standard is usually the dose 
giving the response in 50% of subjects. It is then designated as the 
Effective Dose 50 (£D50), or Lethal Dose 50 (LD50) where the re- 
sponse observed is death. Where time of exposure is the variable in 
treatment we have similarly Effective Time 50 (#750) and Lethal 
Time 50 (L750). The statistical analysis also provides means of testing 
the linearity of the regression relation, of comparing two or more 
regressions for correspondence, particularly in slope, and for finding the 
standard errors, and hence confidence limits, of the various quantities 
taken as specifying the regression line or measuring the potency of the 
treatment. 

These methods have been developed particularly in relation to the 
type of assay where each treatment is administered to a different group of 
subjects, the fate of each of which can be observed individually. ‘The ex- 
perimental data then consist of proportions of individuals observed to 
respond to the treatment at its various levels and the analysis is under- 
taken by the now familiar method of probits, which has been fully 
described in both its derivation and its application by Finney (1947). 
The use of the probits in the comparison of bactericidal properties of 
a range of disinfectants has been illustrated by Berry and Michaels 
(1947, 1948, and in the press) who counted the numbers of bacteria 
surviving after exposure to these disinfectants for various lengths of 
time of samples from an original homogeneous culture. 
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Not all tests of bactericidal action however yield data of this kind. 
Another type of test, which we may refer to as the method of Ex- 
tinction Times, aims at finding the time necessary to kill, or at least 
render ineffective, all the bacteria in the test sample. In practice a 
series of samples of the standard bacterial suspension, to which has 
been added the disinfectant, are “quenched” with nutritive broth after 
suitable intervals have elapsed. The samples are incubated, and any 
in which one or more active bacteria survive will then be detectable 
by the growth of the organism. The complete extinction of active 
bacteria can thus be related to the time of exposure to the disinfectant. 

It has been supposed that there must exist a unique extinction time 
for any given bacterial suspension subjected to any given treatment of 
disinfection: that all samples quenched before the characteristic time 
had elapsed would show growth, and all after that time would be devoid 
of active bacteria. Such a simple view, of course, overlooks the varia- 
tion which occurs between individual bacteria in their tolerance of 
disinfectants, and which must lead to marked sampling variation in 
the outcome of this type of test particularly. Instead of a simple sharp 
extinction point being shown, a series of tests may differ in the time 
of exposure after which no growth is found. And even in a single test 
growth may be found in samples which have been quenched after a 
longer exposure than others where no growth occurs. The series of 18 
tests, in each of which samples were quenched every 2 minutes between 
exposure times of 12 and 26 minutes, shown in Table 1 illustrate the 
type of data which is yielded by the method of extinction times. Our 
problem is that of analysing such data so as to specify and measure in 
a way suitable for comparative purposes the relation between exposure 
time and extinction. 

I am indebted to Professor H. Berry and Mr. H. 8S. Bean of the 
School of Pharmacy, University of London for drawing my attention 
to the problem, and to Mr. Bean also for providing me with the data 
of Table 1 as illustrative material. 


~ THE METHOD OF ANALYSIS 


The natural variation in individual tolerance of the bacteria will 
result in the death or inactivation of the organisms not being simul- 
taneous. The number of survivors in a sample will not suddenly 
become zero, but will diminish more or less gradually as the period of 
exposure lengthens. This number will itself be subject to sampling 
variation, so that although there will be a mean number of survivors 
characteristic of a range of similar samples exposed for similar times, 
the number surviving will vary from sample to sample. Where, as in 
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TABLE 1 
DISINFECTION OF BACTERIUM COLI (LISTER STRAIN 5933) BY 1.15% PHENOL 


Time of exposure in minutes 
Series 12 14 16 18 20 22 24 26 
1 sis “ + = =: a cae = 
2 ai == = a = = = 
3 a5 5 siz = ar =, = = 
4 = = + == = == = Ste 
5 =r == = aie = = = a 
6 at ote =e ats = = = = 
7 at Sie = =F = “Ir = 
8 zis ais + ate = = = = 
9 zip a5 == sie “Ts sae az = 
10 ar a += = aie == == = 
11 ale a a= te = = = > 
12 == = alg =f = = = = 
13 te == a7 = =i ae = = 
14 a = ae =f ar te = = 
15 = te “iF ae = = = = — 
16 alm =e a = = = = = 
17 us + pe = = Bs = = 
18 ae at = ae = = ca = 
Total —ve 0 0 3 7 12 14 17 Ne 
Pp 0.0 0.0 0.17 0.39 0.67 0.78 0.94 0.94 
r — — 1.79 0.94 0.41 0.25 0.06 0.06 
y - - 0.58 —0.06 —0.90 —1.38 —2.86 —2.86 


+ = positive reaction, i.e. bacteria surviving 

— = negative reaction, 7.e. no bacteria surviving 
our tests, the survival of even one individual will lead to a positive 
result, the useful observations of response must cover ranges of samples 
in which the mean number of survivors is small. We shall in fact be 
concerned with the situation where comparable samples may contain 
only 0, 1, 2, 3 ete. individual survivors at the time of quenching. The 
frequencies of samples with these various numbers of survivors will 
then be expected to follow a Poisson distribution. Thus if the mean 
number of survivors is A, the proportions of samples with 0, 1, 2, 3 ete. 


should fall in the series 
ye oe 
ak es a eae Ie 
¢ (. Xs 91 31 ) 


The technique distinguishes only between samples with on the one 
hand 0, and on the other 1 or more survivors, so that a proportion 
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e ’ should give the negative result of no growth and 1 — e * the positive 
result of growth. The mean number of survivors is thus estimated by 
— log p, where p is the proportion of negative samples. 

Now the mean number may itself be taken as falling off logarithmic- 
ally with time (or some simple function such as the logarithm of time) 
at least over the short range of assay times with which we are con- 
cerned. In other words log \ should be linearly related to time ex- 
pressed on a suitable scale. Hence log \ = log (— log p) should give 
a straight line, within the limit of sampling error, when plotted against 
time. Instead, therefore of transforming into probits in order to 
achieve the desired linear relation, we must transform the data by 
taking the logarithm of the negative logarithm of the proportion of 
samples which fails to show growth after each of the exposure times 
used in the assay. This we will refer to as the loglog transformation 
for the sake of convenience. It may be remarked here that the trans- 
formation must be made using natural logarithms or the subsequent 
test of homogeneity of the samples and linearity of the relation with 
time will be vitiated. 

Given this basic principle of transformation, the rest of the analysis 
with its weighting coefficients and working adjustments can be found 
by Fisher’s method of maximum likelihood, applied in the way which 
Finney (1947) develops in his Appendix II. Where z is the time of 
exposure and Y = log (— log P), P being the chance of a sample being 
—ve, we wish to find the constants a and @ in the rectilinear relation 


=o pe 


The weight to be given in the calculation to any Y value can be found 
simply as the amount of information which the observed classification 
into —ve and +ve samples yields about Y. This is found as 


where m is the proportion in the class in question, n is the number of 
samples exposed for the given time and S indicates summation over 
all classes. Now Y = log (—log P) so that dP/dY = P log P, and 
dQ/dY = |dG — P)|/dY = —P loe Pe. 


Then 


tonlp las) +9 (G9) | 
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nP log P. 
Q 


Where after a given exposure the chance of a sample being —ve is 
P, the probability of r samples being —ve out of n observed is 


= n(P log py(t E | is 


n! 


rin — ihe Qn 


With a series of exposure times the log likelihood of the particular set 
of results observed is proportional to 
L = S[r log P] + S{(n — r) log Q] 


summation proceeding over all times. . Thus if a and £ are the param- 
eters determining Y and hence P their estimates will be given by the 
solution of the equations 


pie |e oP = 1 00 a ee 
da [Be] + | 57 2] = sf PQ da = 


pe | nor eel [me PF | 
ae [pe] + sf (meas at eo Aho | eee 


where p= is the observed proportion of —ve samples. 

The direct estimation of a and 8 from this equation may well, as 
Finney points out, be impossible. They may, however, be obtained 
by a process of successive approximation. If, therefore, we obtain a 
first approximate relation between Y and time by inspection, adjust- 
ments to a and @ can then be found from the general equations 


aL oD aL 
Fae aa a Pe a5 Oe: 

aL roa aL 48 aL =i 
aB, | °° AB, dex api 


the suffix to a and 6 denoting the substitution of the approximate values 
after differentiation. 


Now 


so that 
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oP oP 
ao. Flos P and P= dip aye 


and the adjustment equation become 


Se o SP lon? log P 7 + 38 | ae | = g| Deee (p — P| 


Q Q 
Oe 3 a | + 38 ( er “| cE a ee Pe | 
Then substituting the weighting coefficient 
_Iy_ PlogP 
ee Q 
Pie ania) = s| nw (2 ae )| (1) 
Plog Pr. 
da S(nwx) + 68 S(rnwa?) = s| nw (P sal Je | (2) 
Pilger 
Let 
VED a [eke aha (Ga ay 
so that 
a= a +82 and b= 3 
where 


as 
a aee is the weighted mean of x 


Equation (1) then becomes 


(da + 88 2)S(nw) = 6a S(nw) = s| nol 2 ae) | 


Multiplying (1) by x and subtracting the product from (2) gives 
da S(rwx) — x ba S(nw) + 68 S(nwx”) — x 5B S(nw) 


= fra Pe] ~29[ nel PP) 


da S(nwx) = & ba S(nw) 


Now 
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and 


S?(nwz) 


xz 68 S(nw) = 66 Scus) 


so that the last equation becomes 


2 S*( XL =P 
68 | Stowe — Sones | = S| mf Pa — a) | 


or 


68 S[nw(x — Z)"] = s| mote zs a | 


Put 


Tt Sad a 
Seta 


the working loglog, and we find 


Extarsa) 

ea gatg ee SnwY) , ° P log P SOM ee 
‘ S(nw) S(nw) Siw) 

and 


S[nw(2 — 2) Y] 
S[nw(2 — %)"] 


al PP 
a s| meta . al} log *) _ Slnw(e = Dy] 
S[nw(2 — Z)”] S{nw(c.— 2)’] 


Ww 
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which are the required equations of estimation of the line relating 
response and time to the next approximation. The process can be 
repeated using the new calculated approximation to Y to obtain a third 


approximation, and so on as long as is necessary. 


The analysis is thus exactly like a probit analysis except that we 


have 
y = log (—log P) 
———2 
IEICE” 
bi cies a) 


and 
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— 37 Gime 
Yo = 3 POR 


in place of the corresponding probit relations. 
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FIGURE 1 


The calculation of the regression of loglog (y) on time (xy for the disinfection of Bacteriwm coli by 

1.15% phenol. The observed results are shown as dots. The arrows to the dots at times 12 and 14 

indicate that no negative cultures were obtained at these times so that the observed log!og is inde- 
terminately large. 


THE CALCULATION 


The data of Table 1 will serve to illustrate the practical use of these 
equations of estimation. The data consist of 18 samples at each of 
the 8 times. The observed values of p, \ (= —log p) and y(= log 
(— log p)) are shown at the bottom of the Table and y is plotted against 
the time x in Fig. 1. At times’ 12 and 14 no —ve samples were found 
so that y cannot be shown in the graph for these times. A trial line 


'Time has been expressed in minutes in this calculation. The scale of log minutes might be desir- 
able on some occasions. 
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is drawn by inspection and for it we find the expected values, Y, as 
shown in Table 2. From these are found, by the counter transforma- 


TABLE 2 
CALCULATION OF THE LOGLOG REGRESSION LINE 


x Pp 7] Ne iP w Vw Ww 
(2nd approx) 

12 0.0000 —— 1.9 0.0012 0.0559 2.0496 2.11630 
14 0.0000 — 12 0.0361 0.4128 i D012 1.4109 
16 0.1667 0.5831 Or5 0.1923 0.6472 0.5807 0.6587 
18 0.3889 —0.0571 —0.2 0.4410 0.5288 —0.0557 —0.0935 
20 0.6667 —0 .9038 —0.9 0.6661 0.3293 —0 .9022 —0 .8457 
22 0.7778 —1.3812 —1.6 0.8172 0.1822 —1.3612 —1.5979 
24 0.9444 —2.8612 —2.3 0.9046 0.0954 —2.7386 —2.3501 
26 0.9444 —2.8612 —3.0 0.9514 0.0485 —2 .8522 —3.1023 

S(wr) = 40.4686 S(wy,) = 0.135944 S(w) = 2.3001 

S(wzr?) = 733.6140 S(wy,2) = 3.100673 S(wryy) = —5.73134 


1st calculated regression line, from which Y(2nd approx) is found, is Y = 6.6763 — 0.37610x 


tion P = antilog (—antilog Y), the expected values of log P, P and Q 

(1 — P). The weighting coefficients then follow as (P log P WO 
Knowing both the observed p and the first approximate expectation 
we can compute (p — P)/(P log P), which when added to Y gives 
y. , the working loglog. 

Table I, of the loglog transformation itself, and Table II of the 
weighting coefficients, maximum loglogs and working ranges, have been 
prepared to facilitate this part of the calculation. It will be observed 
that unlike probits, the relation of loglog to proportion is not symmetrical 
round 50% so that all these Tables must be constructed to cover the 
full range fot P= 0 to.L. 

Having obtained the values of y,, and w for each time z, the calcula- 
tion proceeds as in probit analysis. One minor simplification will, 
‘however, often be possible and has been used in the present case. In 
all, 18 series of samples were run, each covering all 8 times. The weight 
of each of the 8 points in the calculation will thus be 18w where w is 
the SRE ORE ste coefficient. To save multiplying each of 8 w’s by 18, 
the actual w’s themselves have been used in the calculation. Since, 
where n is constant for all points, 


S(rwyw) _ Swyw) 


Yu = “S(nw) S(w) 


S(nwz) _ S(wz) 
S(nw) ~ S(w) 
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Table I 
TRANSFORMATION OF EXTINCTION POINT DATA TO LOGLOGS 


Y = log (—log p) 
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p 0.000} 0.001} 0.002} 0.003} 0.004} 0.005} 0.006) 0.007; 0.008) 0.009 
0.00 co 1.933] 1.827) 1.759] 1.709) 1.667) 1.633) 1.602) 1.574) 1.550 
0.01 1.527/ 1.506] 1.487) 1.466] 1.451) 1.435) 1.420) 1.405) 1.392) 1.377 
0.02 1.364) 1.351) 1.340) 1.328) 1.316] 1.305) 1.295) 1.284) 1.274) 1.264 
0.03 15255) 1 5245)) 1.236) 15227) 0 2218)) 1209) 20 P1938), 85s de 
0.04 ea ess ye aba) sal saWsy ty ae AXoyh absaley)i) ab te y2)) al aleysy)  aeabalksy| Tuatha fos? 
0.05 1.097; 1.091) 1.084) 1.077) 1.071) 1.065) 1.059) 1.053} 1.046) 1.040 
0.06 1.034) 1.029} 1.023); 1.017) 1.011} 1.005} 1.000; 0.994) 0.989) 0.983 
0.07 0.978} 0.973] 0.967) 0.962) 0.957) 0.952] 0.947) 0.942) 0.937) 0.932 
0.08 0.927) 0.922) 0.917; 0.912) 0.907) 0.902} 0.897} 0.893) 0.888) 0.883 
0.09 0.879} 0.874) 0.870} 0.865) 0.861) 0.856] 0.852} 0.847| 0.843) 0.838 
0.10 0.834) 0.830} 0.825} 0.821) 0.817) 0.813] 0.808} 0.804| 0.800) 0.796 
@ yal 0.792) 0.788) 0.784) 0.780) 0.775) 0.771] 0.767) 0.763) 0.759} 0.756 
0.12 0.752) 0.748) 0.744) 0.740) 0.736] 0.732] 0.728} 0.724) 0.721) 0.717 
0.13 0.713) 0.709} 0.706} 0.702) 0.698] 0.694} 0.691} 0.68 0.683} 0.680 
0.14 0.676) 0.672) 0.668) 0.665] 0.662} 0.658} 0.655) 0.651] 0.647) 0.644 
0.15 0.640) 0.637) 0.633] 0.630} 0.626) 0.623) 0.619] 0.616} 0.613) 0.609 
0.16 0.606) 0.602} 0.599} 0.596] 0.592] 0.589} 0.585] 0.582] 0.57 0.575 
0.17 0.572) 0.569) 0.565) 0.562) 0.559] 0.556) 0.552! 0.549] 0.546] 0.543 
0.18 0.539} 0.536} 0.533) 0.529} 0.526) 0.523] 0.520} 0.517| 0.513] 0.510 
0.19 0.507} 0.504) 0.501! 0.498] 0.494] 0.492} 0.489] 0.485} 0.482] 0.47 
0.20 0.476} 0.473} 0.470} 0.467) 0.464] 0.461) 0.457} 0.454) 0.451] 0.448 
0.21 0.445} 0.442) 0.439] 0.436]. 0.433) 0.430] 0.427) 0.424] 0.421) 0.418 
0.22 0.415) 0.412) 0.409) 0.406} 0.403] 0.400} 0.397] 0.394] 0.391] 0.388 
0.23 0.385) 0.382) 0.379] 0.376] 0.373] 0.370} 0.367] 0.365} 0.362] 0.358 
0.24 0.356) 0.353) 0.350) 0.347) 0.344] 0.341} 0.338] 0.335} 0.332] 0.329 
0.25 0.326) 0.324) 0.321) 0.318} 0.315] 0.313} 0.310] 0.307} 0.304] 0.301 
0.26 0.298] 0.295), 0.292) 0.290! 0.287] 0.284] 0.281] 0.278] 0.275] 0.272 
0.27 0.269} 0.267) 0.264) 0.261) 0.258] 0.255] 0.252] 0.250] 0.247] 0.244 
0.28 0.241) 0.238) 0.236) 0.233] 0.230] 0.227] 0.225] 0.222] 0.219] 0.216 
0.29 0.213} 0.210) 0.208) 0.205} 0.202} 0.200] 0.196] 0.194] 0.191] 0.188 
0.30 0.186} 0.183) 0.180) 0.177] 0.175] 0.172] 0.169] 0.166] 0.164] 0.160 
0.31 0.158) 0.155} 0.153) 0.150} 0.147} 0.144] 0.141] 0.139] 0.136] 0.133 
0.32 0.130) 0.127] 0.125] 0.122) 0.120] 0.117] 0.114]! 0.111] 0.109] 0.106 
0.33 0.103} 0.100) 0.098) 0.095} 0.092} 0.090] 0.087] 0.084! 0.081] 0.079 
0.34 0.076) 0.073} 0.070) 0.067] 0.065} 0.062} 0.059] 0.056] 0.054! 0.051 
0.35 0.048) 0.046) 0.043) 0.040] 0.038] 0.035] 0.032] 0.030/ 0.027] 0.024 
0.36 0.021) 0.019) 0.016) 0.013] 0.011] 0.008] 0.005! 0.002] —0.000 —0 .003 
0.37 —0 .006) —0 .008/ —0 .011] —0 .014| —0 017] —0 .019] —0 .022| —0 .025| —0 .028 —0.030 
0.38 —0 .033] —0 .036} —0 .038] —0 .041| —0 .044| —0 .047] —0 .049] —0 .052| —0 .055 —0 .057 
0.39 —0 .060] —0 .063) —0 .066] —0 .068] —0 .071| —0 .074| —0 .076| —0 .079] —0 .082 —0.085 
0.40 —0.087) —0 .090] —0 .093] —0 .096] —0 .098) —0 .101| —0 .104] —0.106] —0 .109 —0.112 
0.41 —0.115) —0.117} —0.120) —0.123] —0.126] —0 128] —0 131] —0.134 —0.137] —0.139 
0.42 —0.142) —0.145} —0.148) —0 150] —0.153] —0.156] —0.159] 0.161 —0.164) —0.167 
0.43 —0.170) —0 .172} —0.175| —0.178] —0.181| —0.183] —0.186] —0 189 —0.192) —0.194 
0.44 —0.197) —O .200} —0 .203] —0.206] —0 208] —0.211| —0 .214 —0.217| —0.219] —0 .222 
0.45 —0 .225] —0 .228) —0 .231] —0.233] —0 236] —0 239] —0 242] —0 245 —0.247) —0.250 
0.46 —0.253] —0 256] —0 .259] —0 261] —0 .264| —0 267] —0 .270 —0.273] —0 .275| —0.278 
0.47 —0.281| —0 .284| —0 .287] —0 289] —0 .292] —0 295] —0 298 —0.301| —0.304| —0 306 
0.48 —0.309) —0 .312] —0 315] —0 318] —0 321] —0 .324| —0 .326 —0 329] —0 332] —0.335 
0.49 —0 .338) —0 .341) —0 .343] —0.346] —0.349] —0 .352| —0 355 —0.358] —0 .361| —0 .364 
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TABLE I—Continued 


Dp 0.000) 0.001)! 0.002} 0.003} 0.004} 0.005) 0.006] 0.007) 0.008) 0.009 
= y —+4 

70; 50 —0.366) —0 .369] —0 .372| —0 .375| —0.378] —0 .381| —0.384] —0.387| —0.390| —0.393 
0.51 —0 395) —0 398] —0.401| —0 .404| —0.407| —0 410] —0.413] —0.416] —0 .419| —0 .422 
0.52 —0 .425|) —0 428} —0 431) —0.434| —0 436] —0 .439| —0 .442| —0 .445| —0 .448] —0.451 
0.53 0.454) —0 .457| —0.460| —0 463] —0 466] —0.469] —0 472] —0.475| —0.478| —0 .481 
0.54 —0 .484| —0 .487) —0.490} —0 493] —0.496] —0 499] —0.502] —0 .505] —0.508] —0.511 
0.55 —0.515} —0 518] —0 .521| —0 524] —0 .527] —0 .530] —0 .533| —0 .536] —0 539] —0.542 
0.56 —0.545| —0 548] —0.551] —0 554] —0 .557} —0 561] —0.564|] —0.567] —0.570] —0.573 
0.57 —0.576|) —0 .579) —0.582} —0 585] —0 589] —0 592} —0 595] —0.598] —0.601| —0.604 
0.58 —0.608} —0.611| —0.614| —0 .617| —0 .620) —0 .623} —0 .627) —0 630] —0.633] —0.636 
0.59 —0.639| —O .643| —0 646) —0 649] —0 652] —0.655] —0.659] —0 .662| —0 665] —0.668 
0.60 0.672) —0 .675| —0 .678| —0 .682| —0 .685| —0 688] —0 .691| —0.695| —0 .698] —0.701 
0.61 —0.705|) —0.708) —O.711| —0.715) —0 .718} —0 .721} —0 .725] —0 .728] —0 .731| —0.735 
0.62 —0.738) —0.742) —0.745| —0.748] —0 .752] —0.755| —0.758] —0 .762] —0 .765| —0.769 
0.63 —0.772| —0.774| —0 .779| —0 .782| —0 .786| —0 .789| —0.793}] —0.796| —0 .800] —0.803 
0.64 —0.807| —0.810} —0.814| —0.817) —0.821] —0.824| —0 828} —0.831] —0.835] —0.839 
0.65 —0.842| —0 846) —0 849} —0.853) —0 856] —0 .860} —0 .864| —0.867| —0.871} —0.875 
0.66 —0 .878) —0 .882| —0 .886| —0 .889] —0 .893) —0 .896] —0 .900] —0 .904] —0.908] —0.911 
0.67 —0_.915| —0.919| —0 .923| —0 .926} —0 .930| —0 .934| —0 .938| —0 .941| —0 .945| —0 .949 
0.68 —0 .953} —0 .957| —O .961| —0 .964| —0 .968} —0 .972} —0 .976] —0O .980| —0O .984| —0 .988 
0.69 0.991) —O .995| —0 .999/ —1 .003} —1 .007| —1.011| —1 .015) —1 .019] —1 .023) —1.027 
0.70 —1.031| —1.035| —1.039) —1 043) —1.047} —1.051] —1.055| —1.059] —1 .063] —1.067 
0.71 —1.072| —1.076] —1.080} —1.084} —1.088] —1 .092} —1.096} —1.101|] —1.10&) —1.109 
0.72 —1.113}) —1.117| —1.122|} —1.126} —1.130} —1.134) —1.139] —1.143) —1.147) —1.152 
0.73 —1.156} —1.160) —1.165} —1.169} —1.173} —1.178] —1.183] —1.187] —1.191] —1.196 
0.74 —1.200} —1.205| —1.209] —1.214) —1.218] —1 .223] —1.228} —1 .232) —1.237) —1.241 
0.75 —1.246| —1.250] —1.255| —1 260} —1.264| —1.269| —1.274| —1 279] —1.283] —1.288 
0.76 —1.293] —1.298] —1.303] —1 .308] —1.312] —1.317| —1.322| —1 327] —1 332] —1.337 
Onn —1.342| —1.347| —1.352] —1.357} —1.362| —1.367] —1.372| —1.377| —1.382] —1.387 
0.78 —1.392| —1.398] —1.403| —1.408] —1.413] —1.418] —1.424] —1.429] —1.434]) —1.440 
0.79 —1.445} —1.450] —1 456] —1.461] —1.467| —1.472| —1.478] —1.483| —1.489] —1.494 
0.80 —1.500] —1.506] —1.511| —1.517} —1.522] —1.528] —1.534] —1.540) —1.546) —1.551 
0.81 —1.557| —1.563] —1.569] —1.575| —1.581] —1.587| —1.593] —1.599) —1.605| —1.611 
0.82 —1.617| —1.624| —1.630] —1.636|] —1.642] —1.648] —1.654) —1.661] —1.667| —1.674 
0.83 —1.680] —1.687] —1.693| —1.700| —1.707| —1.713] —1.720] —1.727| —1.733] —1.740 
0.84 —1.746| —1.753| —1.760| —1.767| —1.774| —1.781| —1.789| —1.795) —1.802} —1.810 
0.85 —1.817| —1.824] —1.831|] —1.839] —1.846] —1.853] —1.861] —1.869] —1.876| —1.884 
0.86 —1 892] —1 .899] —1.907| —1.915] —1.923] —1.931] —1.939] —1.947] —1.95£) —1.963 
0 87 —1.971| —1 980] —1.988] —1.997| —2.005] —2.014] —2 .022| —2 030] —2 .039] —2 .048 
0.88 —2.057| —2.066| —2.075| —2.084| —2.093] —2.102] —2.112] —2.121) —2.130| —2.140 
0.89 —2.150| —2.159] —2.169| —2.179] —2.188] —2.199] —2 209] —2.219] —2 .229) —< 240 
0.90 —2.250| —2.260| —2.271] —2.283] —2.294| —2.305| —2.316) —2.327) —2.338) —2 350 
0.91 —2.361| —2 373] —2.385| —2.397| —2 .409] —2 1| —2 .434| —2 .445] —2 458] —2.471 
0.92 —2.484| —2 497] —2.511| —2.524| —2.537| —2.551| —2.565] —2 .580 —2.594| —2.608 
0.93 —2 .623| —2.638| —2.654| —2.668] —2.684| —2.700| —2.717) —2.732 —2.749| —2.766 
0.94 —2.782| —2 .800| —2.817] —2 .835] —2.854] —2.872| —2.891] —2.910 —2.930| —2.949 
0.95 —2.970| —2.990| —3 .012] —3 .032] —3 .055| —3.077] —3.101| —3 124| —3.149] —3.172 
0.96 —3.199] —3 .224| —3.249] —3.278] —3.305) —3 .335 —3 .364| —3.393| —3.427| —3.458 
0.97 —3.490| —3.527| —3.561] —3.597| —3 634] —3.677 —3.717| —3.759| —3 .803] —3.854 
0.98 —3 .902| —3 .953] —4.006] —4 .063| —4.129] —4.193 —4 262) —4 335] —4.415] —4.501 
0.99 —4 .595| —4.699| —4.828] —4.962] —5.116) —5.298 —5.521| —5 809] —6.215] —6.908 
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TABLE II 
Weighting Weighting 
Maximum Range Coefficient Maximum Range OSS. 
1 il ig! 1 1 P. 
4 i - dog P)? Y Yi —— —=—— {(loy-g J) 
log P P log P 1—P log P Plog P t= Pp 
1.90 2.0496 —114.9425 .0583 —1.05 1.8080 —4 0552 .2923 
1.80 1.9653 —68 .9655 .0879 —1.10 1.9039 —4.1911 .2804 
if A) 1.8827 —43 .4783 .1264 —1.15 2.0086 —4 3346 . 2690 
1.60 1.8019 —28 .4091 ailirdayss —1.20 2.1201 —4.4863 -2580. 
1.50 W723 —19.7628 .2294 —1,.25 2.2404 —4 .6490 2473 
1.40 1.6466 —14 .2450 .2897 —1.30 2.3697 —4.8193 .2369 
1.30 1.5725 —10 6838 .3524 —1.35 2.5080 —5 .0000 2269 
1.20 t 5012 —8§.3195 .4141 —1.40 2.6552 —5.1894 2174 
1.10 1.4329 —6.7114 .4710 —1.45 2.8126 —5.3908 .2080 
1.00 1.3679 —5.5741 no2 zi —1.50 2.9823 —5.6022 .1990 
0.95 1.3368 —5.1282 .5453 —1.55 3.1603 —5.8241 .1908 
0.90 1.3066 —4.7551 5657 —1.60 3.3529 —6.0606 .1822 
0.85 1.2774 —4 4346 .5839 —1.65 3.5556 —6.3091 .1739 
0.80 1.2493 —4.1597 .5998 —1.70 SB (SED —6.5703 .1665 
(0) 745) e223 —3.9231 .6135 —1.75 4.0037 —6.8446 . 1592 
4 
* 0.70 1.1966 —3.7202 6247 —1.80 4.2496 —7.1378 1522 
0.65 1.1720 —3.5436 .6340 —1.85 4.5113 —7.4460 .1450 
0.60 1.1488 —3 .3944 .6403 —1.90 4.7845 —7.7640 .1389 
0.55 1.1270 —3.2648 6448 —1.95 5.0774 —8.1037 .1827 
0.50 1.1066 —3.1546 .6470 —2.00 5.3910 —8 .4602 .1265 
0.45 1.0876 —3 .0600 .6474 —2.10 6.0633 —9 .2251 1154 
0.40 1.0703 —2.9789 6462 —2.20 6.8253 —10.0806 .1049 
0.35 1.0547 —2.9129 6427 —2.30 7.6701 —11.0254 .0954 
0.30 1.0409 —2.8571 6378 —2.40 8.6254 —12.0773 .0865 
0.25 1.0288 —2.8129 .6313 —2.50 9.6803 —13 .2275 .O787 
0.20 10187, —2.7770 6237 —2.60 10.8590 —14 4928 .0712 
OS) 1.0107 —2.7503 6149 —2.70 12.1810 —15 .9236 0646. 
0.10 1.0047 —2.7322 6047 —2.80 13 .6474 —17.4825 .0593 
0.05 1.0012 —2.7218 .5937 —2.90 15.2818 —19.1939 .0542 
0.00 1.0000 —2,7181 .5820 —3.00 17 .0803 —21.0970 .0494 
—0.05 1.0013 —2.7218 5695 —3.10 19.1222 —23 .2558 .0432 
—0.10 1.0052 —2.7315 .5563 —3.20 21.3098 —25 .5102 .0400 
—0.15 1.0118 —2.7473 .5429 —3.30 23 .8003 —28.0899 .0359 
—0.20 1.0214 —2.7693 .5289 —3.40 26.5401 —30 .9598 .0335 
—0.25 1.0340 —2.7972 .5146 —3.50 29 6126 —34 .1297 .0303 
—0.30 1.0499 —2.8321 .4999 —3 .60 33 .0300 —37 .5940 .0260 
—0.35 1.0690 —2.8711 4853 —3.70 36.7858 —41.4938 .0246 
—0.40 1.0919 —2.9163 4705 —3.80 40 8429 —45 .6621 .0226 
—0.45 1.1184 —2.9674 4559 —3.90 45 .6050 —50.5051 .0200 
—0.50 1.1488 —3 .0239 .4412 —4.00 50.6448 —55 .5556 .0166 
—0.55 1.1834 —3 .0864 4263 —4.10 56.1410 —61 .3497 .0182 
—0.60 1.2222 —3.1546 .4119 —4.20 62.4667 —67 .5676 .0134 
—0.65 1.2657 —3 .2289 .38976 —4.30 69 .2294 —74 6269 .0148 
—0.70 L3l37z —3.3091 .3835 —4.40 76.9008 —82 .6446 .0082 
—0.75 1.3669 —3 .3956 .38695 —4.50 85.5901 —90 .9091 .0091 
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TABLE [I—Continued 


| 

} Weighting Weighting 

Maximum Range Coefficient Maximum Range Coefficient 

1 1 iz 1 1 P 
Mg Nal (log P)2 Yy — — (log P)2 
log P P log P 1—P log P IP Nkeyes 12 1—P 

| s 
—0.80 | 1.4257 —3.4880 3559 —4.60 | 94.4099 | —100.0000 0100 
=0.85 | 1.4897 —3 5868 .3427 —4.70 {105.1901 | —111.1111 0111 
—0.90 1.5594 —3 6928 3295 —4.80 |117.1512 —123 4568 .0123 
—0.95 1.6360 —3.8066 3168 —4.90 [130.2351 | —136.9863 .0135 
—1.00 1.7181 | —3,9262 3044 “| =—5.00 144.2587 | —149.2537 .0000 


_ Shoe — zy.) Slwy,.(x — z)] 


S[nw(x — Z)"] Slw(a — z)"] 


the factor of m = 18 need never appear in their calculation. The sums 
of squares and products obtained using only w must, however, be 
multiplied by n = 18 to give the true values. 

The data of Table 2 gives us 


S(w) = 2.3001,  S(wx) = 40.4686, S(wy,,) = 0.135944 


and hence 


— 2 ae 2 = ? 
S[wYy. an Yw) ] a S(wyw) S(w) 
= 3.100673 — 0.008035 = 3.092638 


S(wx) S(wy,,) 


S[wy.(z — z)] = S(wzy.) — S(w) 
= —5.73134 — 2.39184 = —8.12318 
Swe — 27] = Stun’) — 210 


= 733.6140 — 712.0158 =,21.5982 


Then 
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slice 
21.5982 — 


b= — 0.37610 


and the sum of squares of y,, accounted for by the regression line is 


18(—8.12318)” 
21.5982 


The total sum of squares of y,, is, of course, 18 X 3.09264 = 55.6675 


= 54.9930. 


: : yee : 20 
The analysis of variance, or rather of x° since in such a weighted 
. ~ 2 
analysis the sums of squares are x’’s, 
then becomes 


Item x” N 'B 
Regression 54.9930 1 v. small 
Remainder 0.6745 6 > 0.99 
Total 55.6675 7 


The experiment led to 8 observed proportions, so giving 7 degrees of 
freedom of which 1 corresponds to the x° for regression and 6 to the 
remainder. The latter item has a very high probability and so affords 
no ground for regarding the data as inhomogeneous or the linear re- 
gression as inadequate. 

Where the remainder x” is not significant, we find the sampling 
variances of a and b as 


Va Cm re ae eaten! 
and 
Y= : et ee 0.002572 
S[nw(z — z)?] 388.7676 
thus 


a= yy = 0.0591 + 0.1554 and b = —0.3761 + 0.0507 
Our new approximation to the loglog time relation is 


Y = 0.0591 — 0.3761 (2 — 17.5943) 


I 


= 6.6763 — 0.3761 x 


from which the values of Y in the last column of Table 2 are computed. 
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This second approximation can in turn be used as the basis for calcu- 
lating the third approximation. In the present case the fresh calculation 
gives 


a = —0:0469 and b = —0.3858 


both of which values lie within the ranges covered by the standard 
errors of the values found by the first calculation. The remainder, or 
heterogeneity, x° is indeed raised a little by the second calculation to 
0.8988, so that clearly the second approximation is as good as the 
third and the second calculation redundant. Evidently the trial line 
fitted by eye and used as the basis of the first calculation was sufficiently 
good for the first calculation to provide an adequate correction. The 
three lines, visual trial, first calculated and second calculated are 
shown in Fig. 1. Their agreement is obviously close. 

Having found the line relating loglog to time we can calculate any 
point on it which one may choose to regard as a convenient measure 
of its position and hence the potency of the disinfecting action, just as 
we find the ED50 as a convenient characteristic of the probit regression 
line. We could in fact take the exposure time which gave a proportion 
of 50% —ve samples, but this proportion has no special significance 
in terms of the bacteria—it corresponds to a mean number of 0.69 
surviving bacteria per sample. It seems more appropriate to base 
comparisons on the time at which an average of | bacterium survives 
in the sample. Then A = 1, and P = 0.36788 and Y = 0. 

In the present case Y = 0 when x = 17.751 + 0.414 minutes the 
standard error being found from the formula, also used in probit analysis 


_ w= (bh 4 ew)" 
heya WO @ S(nw) z: S[nw(z — =)’ 


where 7, is the value of x whose standard error it is required to know. 
The second calculation gives the single mean survivor time as 17.804 
minutes, again well within a standard error of that from the first 
calculation. 

Two disinfectant treatments can be compared in their action on a 
standard bacterial suspension through their single mean survivor times, 
just as by comparison of #D50’s in probit analysis, provided the slopes 
of their loglog regression lines may be regarded as alike. Again this 
test of similarity of slope is made in a way exactly comparable to the 
corresponding test in probit analysis. 

One final comparison of loglog and probit analyses. In the latter 
the most informative observations, the ones with the greatest weighting 
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coefficient, are those where P = 0.50 giving a probit value of 5.0. This 
is not true of loglogs. The amount of information per unit observation, 
i.e. the weighting coefficient w, is plotted against P in Fig. 2. For the 


0 OF O02 O03 O04 OFS 06 OF O8 08 HW 
FIGURE 2 


The relation of the amount of information or weighting factor (w) of the loglog to p the proportion of 
negative samples. w = (p log p)/4 


most informative observation P les between 0.2 and 0.3 with the 
maximum w = 0.6476 when P = 0.2032. This value of P is the same 
as the constant e ” which Fisher (1947, Section 68) has found as the 
proportion of sterile samples yielding most information about the mean 
number of bacteria in the culture. He was especially concerned with 
estimation of the number of organisms by the dilution method, and he 
points out that a method depending on the mere recording of presence 
or absence of bacteria is at best much less informative than one which 
counts the bacteria in each sample. It would therefore appear that, 
observation for observation, the precision of the method of extinction 
times in bioassay is much less than can be achieved by plating and 
counting the survivors; though on the other hand it is of course also a 
much less time consuming method. 
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SUMMARY 


In the method of extinction times as applied to disinfection of bac- 
teria, the observational distinction is between samples which contain 
either none or 1 or more of active bacteria surviving after exposure to 
the disinfectant for given times. The data thus consist of proportions 
of samples failing to contain survivors after a series of exposure times. 
This proportion after any time is e * where \ is the mean number of 
survivors after that particular exposure. The mean is expected to fall 
off sufficiently nearly logarithmically with time. A rectilinear relation 
to time can thus be obtained by the loglog transformation Y = log 
.—loe-P): ; 

The calculation of the regression line relating loglog to time follows 
the same course as probit analysis, but with the weighting coefficient 


———2 
apa Piuor P, 


and the working loglog 


The loglog transformation also differs from the probit transforma- 
tion in not being symmetrical round P = 0.5. For the most informative 
observations P has the value 0.2032, but any observation where P lies 
between 0.2 and 0.3 will closely approach this maximum in the in- 
formation it yields. 

A sample calculation is given and it is suggested that the place of 
the HD50 in probit analysis can be taken by the single mean survivor 
time, 7.e. the time at which = 1, Y = Oand P = 0.3679, in loglog analy- 
sis. The weighting coefficient is 90% of its maximum value at this pro- 
portion. 
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THE GENERAL THEORY OF PRIME-POWER 
LATTICE DESIGNS 


III. THE ANALYSIS FOR p> VARIETIES IN BLOCKS OF p PLOTS WITH MORE 
THAN 3 REPLICATES.* 


WALTER T. FEDERERT 


INTRODUCTION 


HE ANALYSIS FoR k° varieties or treatments in blocks of k plots for 

3 replicates has been given by Yates [6]; # may be any of the integers 
2, 3, 4, 5, 6, etc. In addition, he indicated the appropriate method of 
analysis for multiples of 3 replicates. At the same time he named this 
design a ‘‘three-dimensional lattice” while others [1, 2] have designated 
it as the “cubic lattice”. In this paper the design will be known as a 
‘“3-dimensional lattice with one restriction”, the restriction being that 
the whole block or replicate will be divided into p” (p = a prime number) 
incomplete blocks to which groups of p varieties are assigned at random. 

The present paper is the third of a series of publications on prime- 
power lattice designs. The first two papers [4,5] presented the theory. 
The purpose of this paper is to illustrate, with a numerical example, the 
analysis for p’ varieties in incomplete blocks of p varieties for more than 
3 replicates. Although the numerical example contains 3° varieties in 
blocks of 3 plots with 4 replicates, the computational procedures are 
applicable for p = 2, 3, 5, 7, 11, ete. and for 4, 5, etc. replicates. 


DESCRIPTION OF THE NUMERICAL EXAMPLE 


Uniformity trial yield data [7] have been given for corn where the 
smallest unit of observation was pounds of ear corn for a 2 X 5 hill plot. 
The 4 X 5 and 2 X 10 hill plot yields were obtained by grouping 2 of 
the units of observations. The 2 X 10 hill plot yields were used to con- 
struct the numerical example (Table 1) illustrating the analysis for a 3° 
lattice in blocks of 3 plots with 4 replicates. The extension to p’ varieties 
in blocks of p plots and to 5, 6, ete. replicates will be apparent from the 
explanation accompanying the present sample. 


*Contribution of the Statistical Section of the lowa Agricultural Experiment Station in co-operation 
with the Bureau of Agricultural Economics, U.S.D.A., Journal Paper No. 1603, Project 890. 


{Formerly Agricultural Statistician, Bureau of Agricultural Economics, collaborating with the 
Iowa Agricultural Experiment Station. 
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The incomplete block size was three 2 X 10 hill plots or 6-X 10 hills. 
The replicate size was 6 X 90 hills which may not be the most efficient 
shape of replicate for a randomized complete block design [3]. 

The varieties are designated as ijk where i = 0, 1, or A, 7 = 0, 1, or 2 
and ke= 0.1: or 2. The 27 variety numbers run from 000 to 222 (Table 
1). The subscripts of the factors a, b, and ¢ may be denoted as ¢, y, and k, 
respectively. This follows the notation of a true p® = 3° factorial experi- 
ment where the 3 = p levels of the 3 factors are in all possible combina- 
tions. The relationship is purely conventional and is useful in construct- 
ing an incomplete block design to determine which varietal comparisons 
are confounded with incomplete block differences. The factors a, b, and 
c are called pseudo-factors and the main effects and interactions pseudo- 
effects In an experiment which is not a true factorial but which makes 
use of factorial notation. 

Groups of varieties (those making up a pseudo-effect) were assigned 
to the incomplete blocks at random and the variety or treatment desig- 
nations were assigned to the 2 10 hill plots within the incomplete 
blocks at random. The field randomization and plot yields in pounds of 
ear corn per 2 X 10 hill plot are given in Table 1; the variety designa- 
tions are given in parentheses. The incomplete block and replicate 
totals and the grand total are also given (Table 1). In the event that 
Table 1 is not constructed the above totals could be inserted in the field 
books in the appropriate places. 

The pseudo-effects are obtained by taking certain combinations of 
varietal yields. Considering a single replicate, all 27 = p’ varietal 
yields are used to obtain the 3 = p: levels of the pseudo-effect. The com- 
parison among the 3 = p totals yields 2 = (p — 1) degrees of freedom 
for the particular pseudo-effect under consideration. For effect (A )o , all 
the plots are summed for the pseudo-factor a; where 7 = zero; the varie- 
tal yields used to obtain (A), are those listed under A for 7 = zero in 
Table 2. For (A), the varietal yields summed are those listed under A 
for 7 = 1 (Table 2) and for (A), those listed under A for? = 2. The 
remaining main effects and interactions may be obtained in a similar 
manner (Table 2), if it is remembered that the powers and subscripts 
of the pseudo-effects are reduced to modulo p = 3 (that is, divided by 
p = 3 and the remainder substituted) [4]. 

In replicate I, the pseudo-effects, A, B, AB, and AB?’ are confounded 
with the differences among the incomplete blocks. Hach pseudo-effect ' 
has 2 = (p — 1) degrees of freedom. There are 8 = (p — 1) degrees 
of freedom confounded with the differences among the 9 = p” incomplete 
blocks. The pseudo-effects A, C, AC, and AC” are confounded in 
replicate II; B, C, BC, and BC” in replicate III; and AB’, AC, BC, and 
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ABC* in replicate IV. A total of 32 = r(p? — 1) (r = 4-— number of 
replicates) degrees of freedom are confounded in the 4 replicates. The 
confounding in replicates I, II, and III corresponds to that for Yates’ [6] 
Z, Y, and X replicates, respectively. 

Table 2 need not be constructed for each experiment but may be used 
for all succeeding experiments after it has once been constructed for a 
given value of p. 


PROCEDURE FOR COMPUTATIONS 


In addition to the totals obtained in Table 1, another table of totals 
is needed for the analysis of this and similar designs. The 3” yields in 
Table 1 which correspond to varieties listed under each level of a pseudo- 
effect in Table 2 were summed by replicates to obtain the totals in Table 
3. Thus for the pseudo-effect (ABC), , the 9 varieties making up this 
total are (from Table 2) 


001, 010, 022, 100, 112, 121, 202, 211, and 220. 
The sum of the yields of these 9 varieties in replicate I (Table 1) is 
32.5+34.0+35.1+32.6+30.6+30.0+33.8+30.6+31.1= 290.3. 


The remainder of the pseudo-effects were obtained in a similar manner. 
The method for obtaining the last 3 columns of Table 3 is explained in a 
later section of the paper. The main effects and interactions confounded 
in the various replicates are indicated by italics in Table 3. 

The totals in Tables 1 and 3 and the unadjusted variety totals 
(Table 4) are all that are required to obtain the sums of squares for the 


analysis of variance. 


- I. CALCULATIONS FOR THE ANALYSIS OF VARIANCE 


A procedure for obtaining the analysis of variance for a 3-dimensional 
lattice with one restriction is given below: 


1. Correction term: 


(grand total)’ _ (3235.3) _ 96,918.20 = CT. 
total number 108 


2. Total sum of squares (from Table 1): 
(30.6) + (82.0)° + --- + (ES) etn 28-1 ae 
= 97,680.03 — CT = 761.83 


150 BIOMETRICS, JUNE 1949 


TABLE 4 


UNADJUSTED TOTALS (POUNDS OF EAR CORN) FOR THE 
27 VARIETIES IN TABLE 1 


Variety Total Variety Total 
number (lbs.) number (bs. ) 
000 1IB3., 7 112 111.9 
001 HiGe2 * 11240) 1bls5 (0) 
002 ible, SH 121 119.3 
010 1223 122 119.9 
O11 119.7 200 119.6 
012 121 201 118.9 
020 Sat 202 116.3 
021 126.3 210 118.2 
022 130.0 211 122.4 
100 122.9 212 Hale ©) 
101 125.9 220 aie 
102 117.9 221 114.4 
110 eae 222 120.0 

il AH al 
Total 3235.3 


3. Replicate sum of squares (totals from Table 1): 


(855.5)? + (827.1)? + (788.1)? +. (764.6) 
: oT = 3° wren 


=*9 (7,099 Gls oe IS eae 


4. Variety sum of squares (ignoring blocks)(totals from 
Table 4): 


(113.7)* + (116.2)? + --- + (120.0)? a 
4 


2197 ,033.42 —) Cl = 115.22) 


CT 


5. The randomized block error sum of squares is obtained 


by subtraction of the replicate and variety sum of squares 
from the total, 


761.83 — 181.41 — 115.22-= 465.20. 


6. The sum of squares for blocks eliminating the varietal 
effect may be obtained as the sums of Squares for the 
interaction of levels of the confounded pseudo-effects 
with replicates and the sum of Squares for the com- 
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parisons of the mean confounded versus the mean un- 
confounded effects. In the present example the inter- 
block error sum of squares will be derived from 3 sources 
which will be designated as components (a), (b), and (c). 


(3) Component (a) 


The component (a) sum of squares is the sum of the 
interaction sum of squares of the 3 = p levels 0, 1, and 2, 
of the effects 4, B, AB’, C, AC, and BC with the repli- 
cates in which they are confounded. The interaction 
sum of squares for the AB’ effect may be derived from 
the following 2-way table: 


Rep. I Rep. IV 


(A B?2), 
(AB?), 
(AB?), 


This interaction yields 2 degrees of freedom. There 
will be 2 from each of the 6 interactions or a total of 12 
degrees of freedom for the component (a) sum of squares. 

The within replicate sums of squares were obtained 
for all effects (Table 5); they may be used to obtain the 
interaction sum of squares. The within replicate I sum 
of squares for A is 


I : 7 »  ((A)o + (Ai + a) 
a (ca =5 (A)i + (A)2 ee, 
= * (290.4) a (O77 9) + (287.2) = 855.5)" =9.27, 


By making use of the totals in Table 3 the remainder of 
the within replicate sum of squares in Table 5 may be 
obtained. The within replicate and the replicate sums 
of squares should add to the total sum of squares within 
rounding errors. 

The interaction sum of squares may be obtained 
from 2-way tables as described above or by adding the 
effect sum of squares within the replicates in which it is 
confounded and then subtracting the overall effect sum 
of squares. Thus for the interaction of (A)o , (A), , and 
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(A). with replicates I and II (the effect A is confounded 
with the differences among incomplete blocks in repli- 
cates I and II) the sum of squares is 


9.37 + 18.61 


(290.4 + 284.8)" + (277.9 + 275.8)? + (287.2 + 266.5) 
18 = 27° 


(855.5 + 827.1)” 
a 


+ = 9.37 + 18.61 — 17.12 = 10.86. 
The remaining interaction sums of squares (Table 5) for 
component (a) are obtained in a similar manner. 


(11) Component (b) 


The component (b) sum of squares is the combined 
sums of squares for the comparison of the mean con- 
founded level of the effect in 2 of the replicates and the 
mean unconfounded level of the effect in the other 2 
replicates. For example, A is confounded in replicates 
I and II and unconfounded with the incomplete block 
differences in replicates III and IV. The sum of squares 
for this comparison is 


[(290.4 + 284.8 — 256.9 — 253.0)’ 
p(lt+i1+1+41) = 36 


(277.9 + 275.8 — 264.7 — 262.9)’ 
p(l1+1+1+4 1) = 36 


(287.2 + 266.5 — 266.9 — 248.7)’] 
pd+i1+1+41) = 36 


= _ * a) 2 
(855.5 + S201 soalhcr tomtom) ea 29.30). 
4p” = 108 
The sum of squares for the effects B, AB’, C, AC, and BC 
are obtained similarly (Table 5). These comparisons 
yield a total of 12 degrees of freedom, 2 for each effect. 


(iii) Component (c) 


The component (c) sum of squares represents the 
sums of squares for the comparisons of the mean con- 
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founded effect in one replicate and the mean uncon- 
founded effect in the other 3 replicates. In the numerical 
example (Table 1) there are 4 effects, AB, AC”, BC’, and 
ABC’ which are confounded in one replicate and uncon- 
founded in the other 3._ Each comparison yields 2 degrees 
of freedom, making a total of 8 degrees of freedom for 
the component (c) sum of squares. 

The sum of squares among the 3 differences adjusted 
for the mean difference for effect AB is 


[3(275.5) — 274.1 — 254.6 — 256.1]? 
p(i9+1+i+1) = 108 


[3(290.7) — 276.7 — 266.0 — 252.0]? 
p9+i+i1-+1) = 108 


a 


[3(289.3) — 276.3 — 267.5 — 256.5)” 
p9+ti+ti+td = 108 


a 


_ (8(855.5) — 827.1 — 788.1 — 764.6, 
1p? = 324 


= 6.30. 


The sum of squares for the remaining 3 effects (Table 5) 
are obtained by a similar procedure. 

The intrablock error sum of squares is obtained by sub- 
tracting the replicate, variety (ignoring blocks), and the 
block (eliminating varieties) sums of squares from the 
total sum of squares, thus 


“I 


(OL383° > 18t4Il— 1915.22) — 341.63 = 10857, 


The analysis of variance of the @lata in Table | is given in Table 6 for 
both the randomized complete block and the incomplete block designs. 

An estimate of o; is obtained as the intrablock error variance, 
2.686 (Table 6). An estimate of the amount of intrablock information 
is given by 


IL il 
W = 2.686 = 0.3723008. 


The total of the sum of squares for components (a), (b), and (c) gives 
the sum of squares for blocks (eliminating varieties) with 32 degrees of 
freedom (8 degrees of freedom are confounded with incomplete block 
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differences in each of the 4 replicates). The corresponding mean square 
has the expectation o; + (3/4) 3c; (see Kempthorne and Federer [4]) 


TABLE 6. 


ANALYSES OF VARIANCE 


s 


AS RANDOMIZED COMPLETE BLOCK DESIGN 


| 
Source of variation d.f. | Sum of squares | Mean square 
Replicates 3 181.41 60.47 
Varieties 26 (15a 22 4.48 : 
Error 78 465.20 5.96 
Total 107 761.83 


AS INCOMPLETE BLOCK DESIGN WITH BLOCKS OF 3 


VARIETIES 


Average value of 


Source of variation | df. Sum of squares | Mean square Mean square 

Replicates 3 181.41 60.47 

Blocks (elim. var.) | 32 341.63 10.676 oi + (3/4) pop? 
Component (a) 12 147.59 12.299 | o.? + po, 
Component (b) 12 eae 5.952 | ao + (2/4) po? 
Component (c) 8 122.62 15.328 | oj? + (38/4) pay? 

Varieties 26 15, 22 4.438 

Intrablock error 46 12357 2.686 os 

Total 107 761.83 


where a? is the expectation of the intrablock error variance and g; is the 
expectation of the additional variance due to the variation among the 
incomplete block means freed of varietal effects. In general the expecta- 
tion of blocks (eliminating varieties) mean square is o; + [(r — 1)/rlp o 
where r is the number of replicates and p, a prime number, is the 
number of plots in the incomplete block. Using the blocks (eliminating 
varieties) mean square to obtain an estimate of the interblock error mean 
square which is equal to 1/w’ or the reciprocal of the amount of inter- 
block information, w’ is estimated by 


1 


tt 


it 


ww’ 


a: + 30% = (10.676) = 5 (2.686) 


ris 339 


= 0.0749681. 
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Il. EFFICIENCY OF THE INCOMPLETE BLOCK DESIGN RELATIVE TO THE RANDOM- 
IZED COMPLETE BLOCK DESIGN 


The formula 


2D) Ny No [7 fe 
p—1 lwt(r— lw’ ae 2w + (r — 2)w" aD oe rw)’ 


where r = number of replicates, p> = number of varieties, n; = number 
of effects and interactions confounded in (r — 1) replicates, 
nm) = number of effects and interactions confounded in (r — 2) replicates, 


n,-1 = number of effects and interactions confounded in 1 replicate, and 
n, = number of effects and interactions not confounded in any replicate, 
was given by Kempthorne and Federer [4] as the average error variance 
for the comparison of the mean difference of any two adjusted variety 
means. The average effective error variance per plot will be (no. of 
replicates = 4)/2 times this quantity. For the numerical example the 
average effective error variance is 


4 6 4 a 
13 ‘ + Qu. r 3w + w’ 28 3} 


LZ { Carried wet 3 
~ 73 \0.8945378 * 1.1918705 * 1.4892032 


I 


3.716. 


The efficiency of this incomplete block design relative to the ran- 
domized complete design is the ratio of the two average variances, or 


5.96 
3.716 


= 160 percent. 


Since the efficiency of the incomplete block design is rather large (in this 
case the efficiency probably is inflated due to the choice of a long narrow 
replicate for the complete block) the adjustments of the variety means 
are expected to be appreciable and should be made. If the relative 
efficiency were small the adjustments for the variety means would be 
small and the adjusted variety means should differ little if any from the 
unadjusted means. 
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Ill, ADJUSTED VARIETY MEANS 


The mean yield of any treatment combination or variety yield, 
a,b;c, , may be expressed [4] in terms of the means of the pseudo.effects. 
The mean yield of the variety, a,b,c, may be expressed as 


ee AB e AB) as, Ce AC) + (AC) a 
ar (BC) i +4 = (BO aa: 2 (ABO) s+5+% + (CABO yao 


e J 
— (ABO) 5% = (ABiCe too: ace oe 
pe) 
where z is the mean of the experiment and the main effects and interac- 
tions are on a mean per plot basis. The subscripts 7, 7, and k or any 
combination give the level of the effect or interaction when all subscripts 
are reduced modulo p. The unadjusted or the adjusted means may be 
obtained from the above formula: the former may be obtained when the 
effect is given equal weights in all replicates and the latter when the 
effect is weighted inversely to the variance with which it is estimated in 
each replicate. To illustrate the use of the formula in obtaining the 
unadjusted variety totals (or means), the totals in Table 3 and the level 
of the effect in Table 7 are needed. The unadjusted total for variety 


001 is 
A[(A)o + (B)o + (AB). + (AB’). + (C): + (AC), + (AC), + BOC) 


EBC ee (ABU)yee (ABC 4 AB OC) (ABC), 122] 


[1085.1 + 1069.1 + 1060.3 + 1063.6 + 1090.2 + 1075.7 


Newta 


+ 1070.9 + 1092.8 + 1066.2 + 1082.5 + 1076.0 + 1074.2 


+ 1080.4] — 1437.9111 = 116.20 


(or a mean of 29.050), where 1085.1 = 290.4 + 284.8 + 256.9 + 253.0, 
etc.; and the divisor for each effect 1s 3° (p’ in general) since there are 
9 yields making up each level of the effect. The above total for variety 
001 agrees with that obtained by adding the yields for this variety in 
each of the 4 replicates, which is the procedure usually followed in 
obtaining the unadjusted variety total yields. 
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Since some of the main effects and interactions are confounded with 
incomplete block differences in some of the replicates they will have a 
variance of ¢; + 30; = 1/w’ in the replicates in which they are con- 
founded. The unconfounded effects and interactions will be estimated 
with a variance o; = 1/w. If the total of the level of an effect (Table 3) 
is weighted inversely to the variance with which it is estimated in the 
various replicates then a weighted total (of 9 plots) of the effects may be 
obtained (last 3 columns of Table 3). For example, A is confounded 
in replicates I and II and unconfounded in III and IV with estimates of 
variance in I and II of o; + 30; = 1/w’ and of o; = 1/w in replicates 
III and IV. The weighted mean (of 9 plots) for (A)o is 


w’ (290.4 + 284.8) + w(256.9 + 253.0) 
2w + 2w’ 


= 260.4226, 
where w’ = 0.0749681 and w = 0.3723008. In a like manner the mean 
(of 9 plots) for (ABC), , which is unconfounded in all 4 replicates, is 


w(290.3 + 272.3 + 265.5 + 254.4) re 
4w a 


270.6250. 


The remaining means of 9 plots (Table 3) are obtained in the same man- 
ner. 

The adjusted variety means (or totals) are obtained from the formula 
given above when the weighted means (last 3 columns of Table 3) for the 
various levels of the effects are used. The particular level to use for each 
variety may be obtained from the first 14 columns of Table 7. These 
values need not be reproduced for each experiment but may be worked 
out for each value of p and then used again for all succeeding experi- 
ments. For variety 000, 7 = 0,7 = 0, and k = 0 and the zero level for 
all effects is used to obtain the adjusted mean for this variety combina- 
tion. For variety 001, 7 = 0,7 = 0, and k = 1, the adjusted variety 
mean is obtained from the following levels of the effects, 


(A)o + (B)o + (AB). + (AB’). + (C)i + (AC), + (AC). + (BC); 


_ + (BC*), + (ABC), + (ABC), + (AB’C), + (AB?C?), — 12% 


[260.4226 + 265.9288 + 262.4743 + 266.8972 + 272.3838 


gS) 


+ 272.5314 + 265.0869 + 278.5847 + 271.0529 + 270.6250 


+ 274.2139 + 268.5500 + 270.1000] — 359.4778 = 29.2835. 
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The remainder of the adjusted variety means (Table 7) are obtained in 
the same manner. 

If only the first 3 replicates had been used the design and analysis by 
the above computational procedure would result in the same adjusted 
variety means as the method described by Yates [6]. The computational 
procedure described above is more general than Yates’ [6] method in 
some respects in that the method may be easily extended to the case of 
4, 5 or more replicates for p’ varieties. 


IV. STANDARD ERRORS OF A DIFFERENCE BETWEEN ADJUSTED VARIETY MEANS 


For the numerical example the average standard error of a mean 
difference between any 2 adjusted variety means is (see Kempthorne 
and Federer [4]) 


2p — 1) ( ny Ns 2.) 
\ p—-1 \Wwt+(r- Pets Oat (=e i a 


sa 6 4 3 = 
Vi3 (ssa ay 1.1918705 as icons) gee 
where r, 7, , M2, °--: ,”,, wand w’ are as defined previously. 

Since this design is an unbalanced lattice design, some of the varieties 
occur together in an incomplete block while others do not. The standard 
error of a mean difference for any 2 varieties which occur together in an 
incomplete block is 


2 3 3 3) = 1.324. 
SHGSs Sure 


The above standard error is applicable to such comparisons as the 
adjusted mean yield of variety 000 with any of the following adjusted 
variety means: 001, 002, 010, 020, 100, 200, 112, 221. 

A second type of comparison, such as the adjusted mean yield of 
variety 000 with the adjusted means of any of the varieties, 012, 021, 
102, 110, 201, 220, would have a standard error of a mean difference equal 


to 
et oe antes wa. L) = 1874. 
9 \2w + 2w’ ' 3wtw 4w 


The third type of comparison for this design, which would involve 
such comparisons as the adjusted mean yield of variety 000 with the 
adjusted means of any of the varieties, 011, 022, 101, 111, 120, 121, 122, 
202, 210, 211, 212, 222, would have a standard error of 


TABLE 7 
TABLE FOR ADJUSTING VARIETY TOTALS BY METHOD OF EFFECTS AND INTERACTIONS TOTALS FOR 33 VARIETIES 
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UNADJUSTED AND ADJUSTED MEANS 


ATES. 


) 
4 


IN 4 REPLI 
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2 ( 5 2 2 ) 
S = ; = 123s. 
V2 2w + 2w a 3w + w "6 4w 


ae , : : 
Since none of the above standard errors differ materially from the 

average standard error, 1.363 may be used as the standard error of a 

difference for the comparison of any 2 adjusted variety means. 


V. COEFFICIENT OF VARIATION 


The coefficient variation for the 3-dimensional lattice with one re- 
striction is the square root of the average effective error mean square 
divided by the mean of the experiment. The coefficient of variation 
for the numerical example in Table 1 is 


/3.716 
2996. - 6.4 percent. 
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A RELATION BETWEEN THE LOGARITHMIC, POISSON, 
AND NEGATIVE BINOMIAL SERIES 


M. H. QUENOUILLE 


Marischal College, Aberdeen, Scotland 


iP RECENT YEARS, many applications [3, 4] have been found for the 
logarithmic series developed by R. A. Fisher [1] in an investigation 
of the frequency distribution of numbers of species of animals obtained 
in random samples. Fisher derived this distribution by first considering 
a Poisson distribution with mean m, since this is the usually observed 
distribution where we are dealing with homogeneous material. However, 
where we are dealing with heterogeneous material, it is no longer possible 
to assume that m is fixed for all samples, so that Fisher assumed that it 
was distributed as a Gamma-type variable in the form 

I -k, k-1—m/p 

(as 

With this assumption, we may consider the superposition of a set of 
Poisson distributions as resulting in one overall distribution: the nega- 
tive-binomial distribution with parameter, p/(1 + p) = 2, say, and index 
k. The probability of observing a sample of size s is then 


(bs — 1) p 
(hal ilel (ace res 


or the coefficient of ¢° in the expansion of 
(1 — 2)*(1 — xt)” 


In particular, when k — 0, this gave rise to a distribution whose first 
term tended to become infinite. However, upon excluding this term as 


being, in general, unobservable, Fisher obtained the logarithmic distribu- 
tion: 


ax, Sees cae 


where a = —1/log, (1 — 2). 
More recently [2], an alternative relation between these discrete 
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series was noted to exist and to be of some practical importance in 
bacterial counts where counts of individual bacteria and colony counts 
are taken. It was found that whereas the colony counts followed 
Poisson’s distribution, the numbers of bacteria per colony were logarith- 
mically distributed, and that, consequently, the bacterial counts were 
distributed in the negative binomial form. No proof of this relation was 
provided and it is not difficult to derive, but, since it is believed that its 
possible applications extend beyond the field of bacteriology, (e.g. quad- 
rat sampling with over-dispersion), a simple proof is given below. 

Suppose that the number of groups observed in any one occasion is 
distributed in the Poisson form, so that the probability of observing n 
groups is 


—m n 


P(n groups) = 2 7 


Then the probability of observing s individuals in any sample is 
P(s individuals) = >> P(m groups) X P(s individuals in n groups). 


Now the probability of observing s individuals in any one group is 
ax’ /s 
or the coefficient of ¢* in 
—a log, (1 — zt) 
Likewise, the probability of observing s individuals in n groups is the 
coefficient of ¢* in 
tao log, (1 at) lt 
Thus; we have 


P(s individuals) 


com" x [=a log, (1 — 26)! 
nN: 


= Coefficient of t° in >) 


n=0 


= Coefficient of ¢’ in exp [—m — am log, (1 — 2t)] 


Se iCocticicentol? im (lat) 2 
am (am ar s a: 1)! 8 s a =—a ace 
a (Cl = eae since (1 — 2 e 
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This is the same as the (s + 1)th term in a negative binomial series with 
parameter 2 and index am. Consequently, the probability distribution 
of the number of individuals in random samples is the negative binomial. 

Conversely, the assumption of any two of the distributions holding 
leads to the third distribution, provided that the parameters of the loga- 
rithmic and negative binomial distributions are equal when these are 
the known distributions. 

Finally, it is worth noting that by this approach any disparity be- 
tween the mean and the variance of a set of samples can be accounted 
for in terms of the parameter x. We have in fact, 


variance _ il 


mean 1—2z 


This formula may be used to gauge x where over-dispersion is apparent. 
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THE STATISTICAL ANALYSIS OF INSECT COUNTS 
BASED ON THE NEGATIVE BINOMIAL DISTRIBUTION 


F, J. ANSCOMBE 


Lecturer in Mathematics, Cambridge University, England 


HIS NOTE GIVES a summary of the results of a mathematical investiga- 

tion into the sampling theory of the negative binomial distribution, 
carried out during 1947 in the Statistical Department of Rothamsted 
Experimental Station. The work is a development of that of Fisher [5]. 
A full account will be given later elsewhere. 


1. USE OF NEGATIVE BINOMIAL DISTRIBUTION 


Insect counts in the field (and other population counts) are often 
fitted fairly well by a negative binomial distribution. This is described 
by two constants, the mean m and the exponent k. The variance of the 
distribution is 


the expected frequency of zeros is 


(2) p=(i+™) , 


and the chance of observing any positive count r is 


8 pari tr Tey 


The Poisson distribution is obtained as the limit ask. At the other 
end of the scale, as k — 0, we approach the logarithmic series [6]. 

If we have several sets of counts on the same species of insect, from 
different districts or after different treatments, we may find that the 
mean m varies between the sets, but * remains approximately the same. 
To analyse such data statistically, we need to obtain a pooled estimate 
of k from all sets of counts and estimate the mean m separately for each 
set. There is some theoretical evidence [7] to show that k depends on the 
intrinsic power of the species to reproduce itself, while m depends on 
external factors. To try to fit negative binomial distributions with a 
common value of k to sets of counts on the same species is therefore a 


reasonable procedure. 
165 
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2. ESTIMATION OF k FROM A SINGLE LARGE SAMPLE 


We consider first the estimation of m and k from a single set of counts 
(made under uniform conditions). Suppose V counts have been made 
(i.e. the numbers of insects on N experimental units are counted), and 
n of these counts are zeros (i.e. no insects were found on 7m, units). Let 
r be the average number of insects found per count (i.e. the total number 
of insects counted, divided by NV). Then 7 is the best estimate of m. 
The best estimate of k, by the method of maximum likelihood or mini- 
mum x’, is tedious to find; and in practice we require a shorter method. 
Three methods are useful and efficient in various circumstances. 


(1) We equate the variance of the sample to the variance of 
the distribution given above at (1). If s° is the sam- 
ple variance, defined as the sum of squares of deviations 
of the N counts from 7, divided by N — 1, we get as our 
estimate of k 


(4) ae 


(ii) We equate the observed proportion of zeros to the ex- 
pected proportion given at (2) above, i.e. we choose k by 
successive approximation to satisfy 


(5) Re va as a 


(iii) We make a transformation of the actual counts r to a 
new variable y having a variance depending on k but not 
on m, and rather more nearly normally distributed 
[1]. We can then estimate k from the observed variance 
of y. The simplest transformation available is 


(6) Ye lori eis): 


If k is between 2 and 5, this transformation may be 
used whenever m is not too small, say at least 15. Ifk 
is less than 2 or above 5, the transformation may still 
be used if m is large enough; but m may need to be 
considerably higher than 15 (so very much higher, in 
fact, when k < 1, that the possibility of use is almost 
ruled out). Under these conditions, the expected vari- 
ance of y is approximately independent of m and equal 
to 0.1886 ¥/(k), where y’(k) denotes the trigamma func- 
tion, i.e. the second derivative of InI'(k) with respect to 
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lethod (ii) 


hiethod 
(i) 


Efficiency of Estimation of k 


k, and has been tabulated fully by Davis [4]. Roughly, 
v'(k) = 1/(k — 4) when k is above 2, and 1/k’ if k is 
near 0. The procedure for finding k is to guess a value, 
use the above transformation (6), find the sample vari- 
ance s, of y, equate this to the expected variance and so 
get a new estimate of k. The process is repeated if the 
new value of k is much different from the old one. 

For k not less than 2, a more elaborate transforma- 
tion may be used, 


(7) = sinht4|(7+2.) 


This has an expected variance of 0.25 y/(k). ¢ is a 
constant; its best value is 0.375 if k is large, but some- 
what smaller when k is small, 0.2 when k = 2. m may 
now be as low as 4 or 5. The transformation has not 
been investigated for k < 2. 


Of these three methods, (i) and (ii) are quite easy to use, while (ii) is 


rather more bother. Roughly speaking, we use (i) if k > 1, (i) if k < 1. 
The actual efficiency of the methods, as compared with the maximum 
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likelihood method, is indicated in the diagram, which shows, 90%, 7Ovo 
and 50% efficiency contours for methods (i) and (ii). Method (iii) is 
only appropriate when m is not small, and then it is rather more efficient 
than (i). 

The errors of estimation of m and k are independent, if N is large. 
r is always a fully efficient estimate of m. 


3. ESTIMATION OF k FROM SEVERAL SAMPLES 


If we have several sets of counts and wish to estimate a common 
value of k, we may use developments of the three methods just described. 


(i) We guess a value of k, and calculate for each set of 
counts a quantity 


(N — 1)’ — (N — 1 — 1/k)r1. +. 7/k) 
(F + k)? 

Our object is to guess a value of k which makes the 
sum of these expressions 7’ from all sets equal to zero. 
The process converges quite quickly if the working is 
suitably arranged. The divisor (7 + k)” is merely a 
weighting factor and can be replaced by 7” if 7 is always 
rather larger than k (this making the working easier). 
It is assumed here that N is not very small. Presumably 
10 would be large enough, but not 2. 

(ii) We guess a value of k and calculate for each set of 
counts a quantity 


| ia plea Wee ae 4 
(OS) Utes log (1 os Vn (1 i ‘) N oF + &) eit 
Our object is to choose a value of k which makes the 
sum of these expressions U for all sets equal to zero. 
Again, the process converges quite quickly if the work- 
ing is suitably arranged. The multiplier log (1 + 7r/k) 
is a Weighting factor, and may be replaced by log r if 7 
is always large and much larger than k. In any case, 
it is not the optimum weighting factor which is much 
more troublesome to calculate. N is again assumed to 
be not very small. 

(iil) We calculate the variance of the transformed variable y 
for each set, pool the answers and equate to the theoret- 
ical variance. The method applies even if N is as small 
as possible, namely 2. (No estimate of k could be 


ey) ws 
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derived from a single observation.) It is, however, 
subject to the restrictions mentioned above on the 
values of m and k for a suitable transformation to exist. 


4. DESIGN AND ANALYSIS OF EXPERIMENTS 


Let us consider an experiment in which ¢ treatments are compared 
in “randomized blocks” of tN experimental units, these units having. 
been divided at random into ¢ sets of N units for the application of the 
treatments. There may be one or several such blocks. The observations 
consist of counting the number of insects on each experimental unit. If 
negative binomial distributions with a common value of k are to be 
fitted to the sets of V counts for each treatment in each block, k will be 
estimated by one of the methods just described. The analysis will then 
proceed on the totals of each set of N counts. The totals have negative 
binomial distributions with exponent equal to Nk, and may be trans- 
formed as already explained to permit of an analysis of variance. (The 
transformation is different from what may have been used in de- 
termining /, since now the exponent is Nk). The residual error mean 
square in the analysis of variance may be compared with the expected 
variance for the transformation, given above, as a test of heterogeneity 
in the observations. 

If the estimation of k is by method (111), NV may be as small as 2, but 
the infestations must not be too low. For methods (i) and (ii) N will 
need to be larger, and in the absence of more precise information it seems 
reasonable to recommend that N should be at least 10. These remarks 
amplify Beall’s suggestion [3] that NV should be at least 2 in experiments 
of this kind, so that k can be estimated. 

If it is desired to avoid the assumption of a negative binomial form 
of distribution, with constant exponent k, it would be possible to proceed 
by method (iii) to derive an estimate of k and then use the transformation 
so defined for all further work. This would probably be as satisfactory 
a transformation as could be used, unless some precise assumption 
(other than the negative binomial one) were made about the distribution 
of the observations. The final analysis of variance would then be based 
on the totals of the individual transformed counts per set of N units 
and not on a direct transformation of the total count from the N units. 
In fact, the use of the transformation 


(10) y = log (r + 1) 


in this way is well known and common where the standard deviation of 
r appears to be roughly proportional to the mean. 
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TABLE 
Number of eggs on ten shoots 7 Gi oh 

(assuming k = 0.5) 
OM 5b Bowe 0 a ea e © (Asies) 0.0 0.00 0.0 
O25 Tey tenes eee er eres (7 sites) 0.1 0.00 0.2 
(2. Deer ea eee (3 sites) 0.2 Oy wit 3.3 
@8, 12. Mee fi A ee eae -: —0.04 =), 
(2k Siege ete ee ane (2 sites) 0.3 0.27 7.4 
08, 1, 4 GT, 0:5 0.36 7s 
08, 2,3 Sg NT eta eee Se 0.36 3.5 
OORT AD cece eet (3 sites) i —0.24 —2.5 
02, ie i o-Ps fe —0.54 —4.5 
OS CMe Sa oe ee oe (2 sites) 0.6 0.87 19° 1 
08, 12, 22 oe Oo « —0.16 eee) 
0°, 18, We O77 —0.45 —3.9 
(Os, 1B, A, @ et See se TOR 0.8 0.04 0.6 
082255 aa Ae’, Sc ee es 0.9 0.59 535 00 
08, 1, 2, 3, 4 1.0 0.25 —0.4 
03,5; 6 © ial 1.36 9.5 
OG, iL & ee eee iL 85) 1.37 10.6 
(AE ae ere coder and! We Ge —0.48 —3.4 
ee Me 1.6 Qal2 23.9 
0°, my hy 8) 2.2 0.70 Srl 
ees i 62, @ . - 0.70 30 
O, th, & Gy UG 2.6 ils fee 12.6 
05, 1, 2, 42, 17 218 hae! 10.9 
OB 29 pitt ec Te eae me) 5 2.9 4.51 53.5 
OS -91 20a (ane ae ee eer gig oe 3.3 1.42 10.7 
02, 18) 2, 4, 6, 8, 10 3.3 1123 —4.3 
O42, MO), iil, Pe 3.5 2.48 3.2 
08, 16, 24 4.0 4.66 20.7 


5. A NUMERICAL EXAMPLE 


I have not encountered any experimental observations of the sort 
just considered. (Beall [3] gives some examples, however). To illustrate 
the methods of §3, I give here some counts of eggs of Aphis fabae made 
by Dr. D. Price Jones in the course of a survey of the Eastern Counties 
of England in 1947, which he has kindly allowed me to reproduce. 
Ninety-four hedgerow spindle sites were visited, that had been cut down 
the previous winter, so that the shoots were of one-year growth. At each 
site ten shoots were removed and thé A. fabae eggs on them subsequently 
counted. ‘The counts are shown in the table arranged in order of in- 
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TABLE (Continued) 


Number of eggs on ten shoots F U IP 

(assuming k = 0.5) 

Or leton te 4.1 0.89 14.7 
0°, 3, 4 Dil 12 é 4.3 0) 50! 0.1 
- "92 2 DF 1S. 5.0 —2.92 =i 
ayy 3, 5, 6 , 5.5 2.49 25.2 
- TE esate 8, 13, 33 6.3 —0.59 7.4 
2, 3; 4, 5% 6, 9,142 . 6.4 —2.86 WY) 
- Le 2 ro Od 12 1207 6.7 —0.52 —=(60),.0) 
ls aA Ow 25 6.8 —O7oll —2.2 
2 3, 6, 11, 147, 19 7.4 —2.80 —7.0 
ety O, 1A AR 8.0 —2.77 122 
= Ie ’9, 467912. 9.0 —0.17 TO 
es he 10, 13, 31,35 . 10.1 —2.67 —0.3 
02, SiGe tele LO lop 10.9 0.08 —4.4 
0; 32, 4, 5 6, 13715) 2a;930° 2 : cl —1.28 —4.5 
Oe " 2, AZ -G820, 21; 00 she 2 2. . Eee (oie Bin dl 
One; - ie 12, 42, 50 eet tases 5 (atest s 11.9 2.98 6.8 
i "32 Pia OPO, AD emer se hes, oo i 12.5 —2.57 0.0 
Aires A LS RES OU peach otc ote 2 “ —2.57 HAY, 
07; 2, "6, el Omelet SU eeSomcu dae 14.8 0.48 9.7 
P2630) SARA 91 38 GBs rn! oh 16.6 2.17 =0:9 
5 4* 117 19, 20, 31, 32,39 2... - ins ~2.41 =8.7 
3 At 14, 117% 23, 242.33, os 18.0 —2.39 iis 
(eG 71s, AO 149. 19.8 2.49 31.5 
4,9, 12, 17, 18, 22, 23, 24,34,70... 23.3 —2.25 =o 
22, 242, 31, 34, 36, 43, 44, 48,58 . . . 36.4 =2; 01 ~12.9 
OVA Sale) oo; 105, 40,749,104, LLO) 2 2 38.3 —0. 10 —5.8 
1, 8, 10, 18, 26, 32, 44, 52, 82,120 . . 39.3 =f 07 —5.9 
21, 35, 51, 59%, 70, 105, 120, 123, 163 . 80.6 Eas aii 


creasing 7. The eighth line, for example, indicates that at three sites 
six shoots had no eggs, three had one and one had two eggs; while the 
following line indicates that at one site there were five shoots without 
eggs and five with one egg. 

The values of U and 7 are given on the assumption that k = 0.5. 
The sum of the Us is nearly zero, and 0.5 is close to the estimate of k 
given by this method. For the range of values of m that appears to have 
been encountered, method (ii) is considerably more efficient than method 
(i), while method (iii) is inappropriate. We should therefore accept the 


value of k given by method (ii), if any. 
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It appears, however, when we plot U against 7, that the value of k 
is not constant but increases with m. Thus, if we consider the counts in 
which 7 exceeds 4.0, method (ii) gives k equal to about 0.65; while for the 
counts in which 7 is less than 4.0 k is in the neighbourhood of 0.3. The 
effect is too marked to be attributed to the negative correlation between 
nN and? that occurs in repeated sampling of the same population. We 
observe a similar increase in k if we use method (i), plotting 7 against 7; 
but now there is further cause of perplexity, in that the values of k 
indicated by method (i) are appreciably lower than those of method (ii). 
Method (i) indicates an overall value for k round about 0.35 and 0.5 for 
the counts in which? exceeds 4.0. This discrepancy between methods (1) 
and (ii) may perhaps be due to 10 being too low a value of N for both 
methods to be accurate, or it may be due to a departure from the nega- 
tive binomial form of distribution. 

Thus, to sum up, there is clear evidence that k increases somewhat as 
m increases (an effect already noticed with Myzus persicae on potato 
plants [2]), and a suggestion that the form of distribution may perhaps 
depart from an exact negative binomial. In such an extensive series of 
counts, in which 940 experimental units were observed and almost 
5,000 individuals (eggs) were counted, it is not surprising to find some 
contradiction of the simple hypothesis we started with. The same is 
to be expected with almost any kind of statistical material. Much 
attention has been given to investigating the validity of applying analy- 
sis of variance methods to yields in agricultural field experiments (with- 
out the question being entirely settled yet), and no such investigation of 
the validity in practice of the methods outlined in this paper has been 
undertaken. Our hypothesis, of negative binomial distributions with 
constant fk, is the simplest we can make that is at all plausible; and the 
methods based on it are, if not elegant, at least not impossibly clumsy. 
It is suggested that no serious error will attend their use. 

Accordingly, in further work on Price Jones’s data, it would be reason- 
able to assume that k had a constant value of 0.5, if that facilitated the 
treatment. If we wished to correlate infestations at sites with other 
information about the sites, we could transform the total egg count per 
site, namely 10r, by the transformation 


rae sinh (10% “+ a) 
4.25 
and treat this as a normal variable with error variance 4y/ (On ==20.055: 
In fact, no very interesting correlations were observed, as the information 
about the sites was rather imprecise; and whatever associations could be 
perceived, visually, from scatter diagrams, were equally clear when 
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untransformed counts were used. However, had aan counts occurred 
in an experiment of the sort considered in §4, much clearer correlations 
would be expected; and the transformation would enable treatment 
effects to be investigated by analysis of variance. 
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QUERIES 


QUERY: An experiment has been established to examine the 
66 influence of planted forest cover on a number of soil characteris- 
tics. Three kinds of trees have been grown on plots forming a 
randomized-block study with four replications. Because the individual 
plots in each block have to be rather large, the soil characteristics are 
to be obtained by stratified random sampling; and it is desired to obtain 
estimates of sampling errors. For these purposes each plot has been di- 
vided into 5 strata, within each of which 2 soil samples will be taken at 
random. 

The resulting 120 samples are to be analyzed for a number of charac- 
teristics, such as organic matter and porosity. Because the laboratory 
work is costly, the workers would like to minimize it by pooling the field 
samples. In order to do this and still supply an estimate of sampling 
errors within strata, the following proposal has been made; can you tell 
us whether it is sound? 

The 10 samples for each plot would be combined into two composite 
samples, each of which would contain one field sample drawn at random 
from the two in each of the five strata. Data obtained from the resulting 
24 composite samples would be analyzed as follows: 


Source of Variation D/F 
Treatments (7') 2 
Blocks (B) 3 
Experimental Error (7'B) 6 
Sampling Error 12 


By this scheme no estimate of the variance among strata would be 
available; but this is of minor interest. The main reason for estimating 
the sampling error is to provide a basis for adjusting, if necessary, the 
number of future samples in the same strata. For this purpose the vari- 
ance of a single soil sample within a stratum (sx) would be estimated as 


2 


NSE ¢ 


The method you propose is good. You are really specify- 
ANSWER: ing two, randomly placed sampling units in each plot. 

The five soil samples in each sampling unit are customarily 
taken in some systematic pattern (analogous to the knight’s move, for 
example) with a randomly chosen starting point. The scheme you out- 
line is one way of avoiding the possibility that the two units lie closely 
parallel. ; 
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QUERY: [have some data consisting of measurements of three 
67 cord properties (X, = strength, Y, = size, and X¥; = moisture 
regain) for each of five samples in each of three groups represent- 
ing three somewhat different manufacturing processes and raw material 
but which are not known necessarily to affect the above cord properties. 
The object was to determine the best estimate obtainable from these 
data of the coefficient of regression of X, on X, with statistical control of 
X, rather than any significance of differences between groups for any of 
the properties. The coefficients determined from the pooled square and 
cross products of deviations from group means were outside the range of 
coefficients determined from individual groups as follows and the vari- 
ance analysis did not indicate the regression to differ significantly be- 
tween groups. 


Regressions Group I Group II Group III Pooled 
bis-s | 3.932 4.206 6.460 507 
bi3.2 .019 .023 .186 , .012 


At first, my interest was more the mathematical one of whether it could 
be proved that such values were or were not computationally possible. 
Since it has occurred with other data, and each time the computations 
were carefully checked, my question is now more the statistical one of 
whether the pooled results produce the best estimates of the true regres- 
sion coefficient in such cases. 

These data were not derived from a planned experiment but were an 
attempt to analyse available information as a guide to further study. 


The situation presented is as unusual in my experience as 

ANSWER: _ it evidently was in yours. As you have decided, the values 
you got are not precluded in the algebraic setup. I suspect 

such relationships would not often occur in samples of more usual sizes. 
Your degrees of freedom are so few that the tests of homogeneity cannot 
be expected to detect even large departures from hypothetical conditions. 
Since your data are inadequate to furnish desired evidence, your 
question as to which regression produces the best estimate of the popu- 
lation coefficient must be answered (if there is any answer) by other 
knowledge you may have about the sampled population. If these were 
experimental data instead of fictitious, you might know, for example, 
that the manufacturing processes were properly controlled for the produc- 
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tion of uniform quality and that deviations from the regressions may be 
considered homogeneous. If homogeneity is reasonably assumed, you 
would then have to decide whether the three manufacturing processes 
may be considered to affect or not to affect the regressions—the statis- 
tical evidence is suspect. If you believe that the regression coefficients 
are really drawn from a common population, then the pooled coefficients 
are the ones to use. But if you have reason to believe that the three 
manufacturing processes may affect the parameters, then the individual 
regressions should be used for each process. 

If you do not have the knowledge necessary to make these decisions, 
then the only way to proceed is to increase your sampling sufficiently to 
get the information from the experimental data themselves. 


QUERY: A problem has arisen in the course of my research for 

68 which I have been unable to find an adequate answer. The experi- 

ment involves the determination of the effect of androgen treat- 

ment on the activity of the thyroid gland of male white leghorn chicks. 

The activity is correlated with the size of the cells in the gland hence 50 

cells in each of 5 glands from each series (experimental and control) were 
measured. 

If I am not wrong in my understanding of the problem three analyses 
of variance are open for possible consideration. (1) Using the sum of 
squares of treatments as related to the sum of squares of the individual 
cell measurements in estimating F', a highly significant value (47.5) 
results. (2) If F is calculated from the sum of squares of treatments and 
the sum of squares between thyroids then the result (3.89) suggests no 
difference between the populations. (3) Analysis of Variance using the 
means of cell measurements yields an F' value (6.08) which indicates 
significance at about the 4% level. 

The analyses of variance are as follows: (see next page) 

I had decided that the first method would give me the most sensitive 
test. Using either of the other methods one does not take into account 
the variability within the thyroids which I believe is important in the 
analysis. Or can one ignore this variability and assume the means are 
true means since such a large number of measurements is taken in each 
gland? Yet, this does not seem legitimate to me. I am not interested 
in the effect of the treatment upon the cells of the thyroid but rather upon 
the thyroid as shown by the change in the cells. Yet the variability of 
cell heights should play some part in the analysis, I believe. 


QUERIES ar 


D.F. | Ssq ESsq 
Between thyroids 9 75.90 8.48 
Between cells 490 338.24 0.690 
Treatments 1 SQ S2.00 


= 32.77/8.43 = 3.89 


Ssq ISsq 
Between thyroids 8 0.8626 0.1078 
Treatments | i 0.6554 0.6554 
| 


F = 0.6554/0.1078 = 6.08 


I have been unable to find a method of setting the fiducial limits of 
the difference between the mean of the means of the two series, 0.52. 


It seems more convenient to analyze variance not in two 


ANSWER: Laer ae 
or three tables but in a single one, as follows: 
Source of Variation Degrees of Sum of Mean 
Freedom Squares Square 
Treatments 1 32.77 32.77 
Thyroids, same Treatment 8 43.13 5.39 
Cells, same Thyroid 490 338 . 24 0.690 
Total 499 414.14 


In one of your tables, you include the single degree of freedom for 
treatments among the 9 for all thyroids. I have separated these degrees 
of freedom in the combined analysis. 

In your analysis of means, the mean squares for thyroids and treat- 
ments are each 1/50 of the corresponding mean squares in my table. 

It is clear that there is a great deal of variation among thyroids over 
and above that which can be accounted for by the sampling variation 
of cells from the same gland. In testing the hypothesis that the treat- 
ment is without effect on the thyroid glands of male white leghorn 
chicks, the real experimental error must include this variation among 
chicks. The variation among cells within thyroids is, in fact, a part of 
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the mean square for thyroids—you are right in thinking that it cannot be 
ignored. The test is, 


F = 32.77/5.39 = 6.08, df = 1 and 8, P = 0.04 


It is assumed that the chicks were taken at random from the sampled 
population of white leghorn males and that the cells which were meas- 
ured constitute random selections from the cells of the several thyroids. 

Another assumption which has been made above is that the thyroid 
means are normally distributed. If o is the same for the treated and 
untreated populations, confidence limits may be set on the mean differ- 
ence by use of sj = ~/2(5.39)/250 = 0.207 with df = 8. For P = 0.95, 
t = 2.306, so that the half interval is 0.48. 


QUERY: The setup of an experiment was as follows: Six ra- 
69 tions were compared using a total of 186 turkey poults. The 
birds were weighed and listed in order of weight. The 6 heaviest 
birds were assigned to rations 1, 2, 3, 4, 5, 6 respectively, the next six 
birds to rations 6, 5, 4, 3, 2, 1 respectively. This method of selection 
was continued until all 186 birds had been assigned to the 6 rations. The 
object of this procedure was, of course, to have all 6 groups start out the 
experiment with approximately the same average weight. At 11 weeks 
of age all birds were weighed and the weights subjected to an analysis 
of co-variance. Analysis of co-variance was used because although the 
average starting weight was approximately the same for each of the 6 
groups there were some differences. 

My contention has been that, apart from any question of distribution 
of sexes or need for replication of pens (in this experiment there was a 
single pen for each of the 6 diets and the males and females were analyzed 
separately) that no accurate estimate of the error of the group means is 
possible because the birds have not been assigned to the groups at 
random. 

I feel that a fundamental and important point may be involved here 
since one very frequently finds described in the literature the statistical 
analysis of data from experiments in which the experimental animals 
have been assigned to the groups in a systematic or non-random manner. 


The most serious flaw in the experiment you describe is the 
ANSWER: absence of true replicates. With a single pen for each of 
the six diets, there is no reliable estimate of the experi- 
mental error. One could compute the difference between cages and 
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perhaps a standard error of the difference, but there is no information 
as to how much of the difference can be attributed to diet and how much 
to other environmental factors. 

Only one estimate of error occurs to me in this case. If the six treat- 
ments represent six different levels in a single dietary constituent such 
that the results can be fitted with a straight line, three degrees of freedom 
representing the variation around the fitted line would be available for 
anerror term. If this were of the same magnitude as the variation among 
birds within cages, the latter might be given some credence as an esti- 
mate of experimental error. 

The systematic assignment of poults in order of weight might be a 
source of trouble. Intuitively it would not worry me too much for I 
suspect that other factors may be more important than the variation 
within weight groups of six poults. A preferable scheme, of course, 
would be to assign the six birds in each successive weight-class to the 
rations at random or, preferably, in accord with the rows in a series of 
randomized 6 X 6 Latin squares. It is not too important that the con- 
comitant or initial characteristic in covariance be assigned at random, 
provided that this does not bias the distribution to different treatments. 
This follows from the fact that covariance is basically a regression tech- 
nique and the independent variate in regression can be selected arbi- 
trarily by the experimenter without biasing the results. 

In young animals, the use of initial weight as a basis of assignment to 
blocks, or as a covariate, is not very effective as a means of controlling 
error. But a knowledge of food consumed by the experimental units 
(single birds or small groups of them) often results in worthwhile gains 
in efficiency. This is an added argument in favor of true replication. 


Cal. biuise 


THE BIOMETRIC SOCIETY 


One of the first decisions of the Biometric Society was to apply for 
affiliation with the International Statistical Institute as “‘an international 
organization concerned with a field of statistical specialization.’’ We are 
happy to report that this affiliation hasnow been completed. There will 
be an exchange of representatives between the two organizations and 
all members of the Society attending the Conference in Geneva will 
receive an invitation to the ISI meetings in Berne on September 4-10. 
Members of the ISI in turn are invited to attend the Second Biometric 
Conference in Geneva on August 30 to September 2, announced in pre- 
ceding issues of Bzometrics. 

In order to increase our effectiveness internationally, we have been 
in contact with UNESCO through Professor Pierre Auger, Director of 
the Department of Natural Sciences, Professor P. Vayssiére, Secretary- 
General of the International Union of Biological Sciences, Professor 
Stuart Mudd, Secretary of the TUBS, and others. Biometry concerns 
so many different sciences that no simple solution was immediately 
available. One reason is that our Society is organized with the individual 
member as the unit, instead of with the nation as the unit as in the inter- 
national unions. Recently, the International Union of Biological Sci- 
ences has invited the Biometric Society to serve as a specialized Section 
of the IUBS, and this invitation has been accepted by the Council of the 
Society. As a Section, $200 has been allotted by the IUBS for expenses 
in 1949 and funds to aid in the publication of the proceedings of the 
Second International Biometric Conference at Geneva have been in- 
cluded in their budget for 1950. An International Union of Mathe- 
matical Sciences is projected and when formed, it is proposed that a 
mixed Commission on biometry under the International Council of 
Scientific Unions should take over the functions filled now by the new 
section of the IUBS. 

As a result of the recent balloting, D. F. Votaw and E. K. Harris, 
tellers for Society, announce the adoption of the two amendments to the 
Constitution and the election of the following Council members for the 
term 1949-51 inclusive: J. Berkson, W. G. Cochran, D. Mainland, 
V. G. Panse, O. E. Sette and F. Yates. 

As this issue of Biometrics went to press, the two new regions of the 
Society were proceeding with their organization. The Région Frangaise, 
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comprising 45 members, held its first formal meeting on March 15 at the 
Laboratoire de Zoologie de la Sorbonne. By-laws for the Region were 
considered. It was agreed that the Région Francaise should form 
an official French society conforming to the French law of 1901 governing 
such associations. Draft by-laws will be submitted to all regional mem- 
bers and voted upon at the next meeting scheduled in May. The meet- 
ing named a regional committee of three, consisting of Mlle. Colette 
Rothschild and M. Lamotte (both of the Centre National de la Recherche 
Scientifique, Paris) and Dr. Marcel P. Schutzenberger, of the Hépital 
St. Louis. Two papers were presented at this first meeting, one by 
Dr. Leon Vaugien entitled ‘Poids relatifs de la thyroide, des surrénales, 
et de lhypophyse antérieure chez les oiseaux’ and another, “Analyse de 
la relation entre période d’incubation et nombre de particules virulentes 
injectées, dans le cas de la sensibilité héréditaire au gaz carbonique chez 
la Drosophile” by Professor Philippe L’Heritier and M. Kriatchko. 

As reported in our last issue, the first meeting of the Indian Region 
was held in Allahabad on January 5, 1949. The following committee 
was elected to complete the regional organization: Professor P. C. 
Mahalanobis, Dr. U. S. Nair, Dr. P. V. Sukhatme, Dr. R. C. Bose, 
Dr. B. Ramamurthy, Dr. C. R. Rao and Mr. V. M. Dandekar. Draft 
by-laws have been prepared and sent to some 40 members of the Indian 
Region for approval. They provide for a regional vice-president, a 
secretary, a treasurer and a regional committee of nine members, all of 
whom will be voted upon by mail ballot together with the by-laws. 
Through Vice-President Mahalanobis, the Indian Statistical Institute 
has offered facilities for housing the regional office. This will aid in 
expanding the scope and activities of the Indian Region. 

On April 20 the Eastern North American Region sponsored a joint 
session with the American Society for Pharmacology and Experimental 
Therapeutics at the Detroit meeting of the Federation of American 
Societies for Experimental Biology. The program consisted of a bio- 
metrical clinic on pharmacological problems. More than sixty questions 
had been submitted in advance by the pharmacologists, but only a small 
fraction of these could be considered by the panel consisting of C. I. 
Bliss, A. E. Brandt, K. A. Brownlee, S. Lee Crump, D. B. DeLury and 
Lloyd C. Miller (Chairman). About 150 attended the meeting. 

The by-laws of the Western North American Region, adopted at 
Seattle, Washington, in November 1948, have been ratified by the 
Council of the Society and are reprinted below: 

x 
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BY-LAWS—-WESTERN NORTH AMERICAN REGION 


The Western North American Region is governed by the constitution 
of the Biometric Society and the following regional By-laws. 

1. The aim of the Region shall be to promote the understanding of 
quantitative biology and the application of statistical methods to 
biology. 

2. Membership. By definition of the parent society the WNAR 
includes Mexico and those portions of United States and Canada lying 
west of approximately 104° West Longitude. All scientists residing in 
this region who have a substantial interest in quantitative biology, 
whether primarily biologists, statisticians or mathematicians, will be 
welcomed into the organization. 

3. Regional Committee. The Regional Committee shall have author- 
ity to transact necessary business at all times when the annual meeting is 
not in session. It shall report to and be responsible to the membership 
as represented by the annual meeting and to the council of the parent 
society. It shall consist of the regional vice president, who shall be the 
presiding officer, the regional secretary treasurer and six ordinary mem- 
bers. The regional vice president and the regional secretary-treasurer 
shall be elected annually and may not serve for more than two consecu- 
tive terms, the ordinary members shall serve for three years, two to be 
elected each year. At the initial election six ordinary members shall be 
elected; these shall be divided by lot into three groups of two each, one 
to serve three years, another two years and a third one year. 

Affiliations. The Regional Committee may affiliate with national 
societies for the purpose of joint meetings when common aims will be 
so served. 

Annual meeting. There shall be an annual meeting of the region at 
a time and place to be determined by the Regional Committee on advice 
of members. 

Amendment. These By-laws may be amended by a two-thirds vote 
of the members present at any annual meeting. 


NEWS AND NOTES 


ENGLAND—The University of Cambridge has set up a Statistical 
Laboratory under the auspices of the Faculty of Mathematics which was 
opened on March 2, 1949. The staff consists of John Wishart, Reader 
in Statistics since 1931 (who also has responsibility for the instruction of 
agricultural graduate students in statistics and field experimentation) ; 
Henry E. Daniels and Frank J. Anscombe, University Lecturers in 
Mathematics; and Dennis V. Lindley, University Demonstrator. The 
Laboratory accommodates the staff, graduate and visiting students, and 
computing assistants. A course is given yearly leading to a graduate 
Diploma in Mathematical Statistics, during which the candidates do 

~work in one of a number of possible fields of application of statistical 
methods. The remainder of the graduates work for the PL.D. degree. 
The laboratory also offers a consultant service in statistics to University 
and other Departments, and is closely associated, in particular, with the 
University’s Department of Applied Economics, directed by Richard 
Stone. 


INDIA—K. B. Madhava returned from Government work in the 
Labour Bureau to The University Mysore from which he has retired. 
He writes, ‘‘I have not settled down to anything particularly yet, but I 
shall probably work on my own, combining my actuarial practice among 
my old insurance clients with some consulting statistical work.” His 
new address is 70-A Stock Exchange Building, Apollo Street, Fort, 
Bombay. We would like to present a paragraph Mr. Madhava wrote on 
the scope of statistics from his article on “The rule of statistics in the 
formulation of a progressive labour policy.” He says, “The value, 
rather the need, of statistics in practically every field of human endeavour 
in a present day administration of a world-knit progressive State is too 
well known to call for restatement at any length. Statistics has been 
described variously as the straw out of which bricks are made, as the 
brain and braw of a Government, as the counterpart of operational 
research in relation to fighting services, etc. In essence, statistics may be 
likened to the all-pervasive science of meteorology, wide in coverage, 
dominating as a watch tower, and valuable in the service of man to 
navigate safely.” We anticipate many active years of service are in 
store for Mr. Madhava. ... The Indian Society of Agricultural Statistics 
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was founded in 1947 to promote the study of statistical theory and its 
application to agriculture, animal husbandry, agricultural economics 
and allied subjects. The first issue of the society’s journal has appeared. 
Included are the Presidential Address by The Honorable Rajendra 
Prasad, Minister for Food and Agriculture, and three addresses given 
at a symposium on “Statistical Organization for India with special 
reference to Agriculture” by The Honorable P. K. Shanmukham Chetty, 
Minister for Finance, by V. G. Panse, Institute of Plant Industry, 
Indore and by W. P. Natu, Economie and Statistical Adviser, Govern- 
ment of India. The other articles were “Crop survey’s in India” by 
V. G. Panse and P. V. Sukhatme, ‘‘ A new approach to sampling distribu- 
tions of the multivariate normal theory” and “On the distribution of 
estimated error components in analysis of variance and covariance” 
by R. D. Narain, ‘Estimation of genetic variability in plants” by 
V. G. Panse and S. D. Bokil, and “‘On fractional replication of the general 
symmetrical factorial design’”’ by K. Kishen. 


IRELAND—We are going to quote from a letter to the Secretary, 
The Biometric Society from J. J. Brady, Clontarf, Dublin. “‘ You invite 
suggestions in regard to furthering the growth and development of The 
Biometric Society. My opinion is that the great majority of the articles 
published in Bzometrics are too mathematical and theoretical to be of 
assistance to the vast numbers of research workers who require to use 
statistical methods in their experimental work but who have not the 
mathematical training essential to an understanding of such articles. 
... I think there is a great future for a statistical journal which would 
cater to the needs of the biological worker. Why not make Biometrics 
serve that purpose?” The Editorial Committee would like to have 
more articles which show the applications of statistical methods, or 
articles that combine applications with new point in theory. A large 
percentage of our subscribers are research workers in biological fields who 
want to learn more about these statistical tools. 


JAPAN—A Japanese Conference on Experimental Statistics spon- 
sored by the Ministry of Agriculture and Forestry of the Japanese 
government was held in Tokyo in March. About 75 experiment station 
representatives from throughout Japan attended this one-week confer- 
ence. Matayoshi Hatamura of the Ministry’s Agricultural Improvement 
Bureau served as chairman. Warren H. Leonard, chief of the Agricul- 
ture Division in the Occupation, discussed general field plot techniques. 
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Joseph C. Dodson, statistician in the Agriculture Division, discussed the 
design and statistical analysis of experiments. Japanese officials plan 
to hold a similar conference at a later date as a means of improving 
research methods used in their experiment stations. Mr. Dodson’s 
assignment is inthe production branch of the Agriculture Division. He 
is working on food collections. Mr. Leonard writes that he plans to 
return in August to the Colorado Agricultural College, Boulder. 


NEW ZEALAND—J. T. Campbell, Senior Lecturer in Mathematics, 
Victoria University College, Wellington, wrote ““When I returned to 
New Zealand in 1935, I found that there was little provision for instruc- 


tion in statistical work. ... The situation is improving.”” We would like 
to hear about the ‘“‘use"of twins in dairy cattle nutrition experiments and 
similar investigations”... . At our request, A. A. Rayner, Biometrician, 


Extension Division, Department of Agriculture, Wellington, has written 
about his work. ‘The bulk of my work consists of the analysis of data 
from trials supervised by the Crop Experimentalist, P. B. Lynch, who is 
well known for his paper on the measurement, of pastorial production. 
In 1947-48 there were 872 trials laid down mostly on the land of co-oper- 
ating farmers. The trials embrace almost every type of crop and pasture 
grown by farmers in New Zealand, and many special types of investiga- 
tion. I have little to do with routine analysis, but special problems are 
constantly being met, and there is always the designing of experiments. 
In the past few years the designs have been tending to increase in com- 
plexity. There is some work to be done for other Divisions of the 
Department, notably the Animal Research Division, whose milking- 
machine experiments are mainly the concern of Jean Miller. Horticul- 
ture has brought us experiments on storage of onions and packing of 
apples. For the Livestock Division, I have designed a national sampling 
survey for the assessment of such factors as pig losses and size of litters. 
I should say that the chief feature of our trials on farmer’s land is the 
way in which they are laid down and harvested in accordance with 
normal farming practice, with modifications to suit the small experi- 
mental plots. For instance cereal trials are drilled with special 7-coulter 
drills and fertilizers are usually drilled with the seed, though this has its 
difficulties in factorial experiments. Sampling of plots at harvest has 
largely been abandoned in favour of header-harvesting. I think you call 
this sort of machine a ‘combine’. A technique has been evolved in 
which the header is continuously in motion, and border rows are har- 
vested at the same time, but discarded so far as weighing is concerned.” 
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UNITED STATES—“At the national meeting of the American Home 
Economics Association in Minneapolis last June, the Research Depart-_ 
ments Committee on Research Training, under the chairmanship of 
Dr. Margaret A. Ohlson, was commissioned to plan a workshop on design 
of experiments and surveys for persons directing and performing research 
in the field of Home Economics. Iowa State College was selected as the 
place to hold the workshop. The dates are June 13-25th, 1949. Mr. 
Paul Homeyer, Associate Professor in the Statistical Laboratory, has 
been designated as the Statistical Director of the workshop.) <.e-L-2W- 
Anderson, Professor of Mathematical Statistics, Columbia University, 
New York City was a guest lecturer at Iowa State College, Ames during 
his Easter vacation. He gave two lectures on “Applications of multi- 
variate analysis to problems in psychology and education” and a lecture 
on “Estimation of linear restrictions on regression coefficients and appli- 
cations to econometric models’. . .. We cannot keep up with Geoffrey 
Beall who is now with Research Laboratories, Swift and Company, 
Chicago. .. . When we reported last year about K. A. Brownlee, he was 
with Research and Development Department, The Distillers Company, 
Ltd., Surrey, England. Now, he is with E. R. Squibb and Sons, New 
Brunswick, New Jersey. .. . S. Lee Crump, has been at the University 
of Rochester, Rochester, New York since last summer. He went to 
New York from the Statistical Laboratory, Iowa State College, Ames. 
_., Also to leave Ames last fall was Walter T. Federer, who is Professor 
of Biological Statistics, New York State College of Agriculture at Cornell 
University, Ithaca. . . . David W. Fassett, formerly at Cardiology Serv- 
ice, Jackson Memorial Hospital, Miami, Florida now gives this address: 
Laboratory of Industrial Medicine, Kodak Park Works, Eastman Kodak 
Company, Rochester. . . . Marguerite F. Hall is now with School of 
Medicine, Health Center, University of Washington, Seattle. She was 
an Associate Professor, School of Public Health, University of Michigan 
before going to the West coast. . . . William P. Martin is with the 
Agronomy Department, Ohio State University, Columbus. Previously, 
he was with the Southwest Forest and Range Experiment Station, 
Tucson, Arizona. . . . Donald W. Maclaury, Department of Poultry 
Husbandry, University of Kentucky, Lexington was formerly with the 
Department of Poultry Husbandry at Iowa State College, Ames... . 
Maurice Whittinghall of the Department of Zoology, University of 
North Carolina, Chapel Hill joined the Biology Division, Oak Ridge 
National Laboratories, for a six months period of research. He is doing 
research on a genetic problem using Drosophila as the experimental 
animal. ... J. A. Rafferty, Captain M.C., will continue in his position as 
Chief of the Department of Biometrics, School of Aviation Medicine, 
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Randolph Field, Texas, when he becomes a civilian next month. Mr. 
Rafferty’s concept of the functions of a department of biometrics is the 
full breadth of the subtitle on The Biometric Society’s program an- 
nouncements, “the mathematical and statistical aspects of biology.” 
In line with his contention that “statisticians should get out of their rut 
of testing statistical hypotheses”, his program is to include theoretical 
biology and medicine as well as traditional mass-data biometry and 
modern experimental statistics. At present the Department of Bio- 
metrics consists of forty personnel, mostly technical clerks, as computers 
and IBM operators. Mass-data projects on Air Force medical statistics 
are conducted as of special interest to the military establishment. About 
one-third of the workload is devoted to the design and analysis of labo- 
ratory and questionnaire investigations, done in consultation with 
research workers in all other departments of the School of Aviation 
Medicine. Another third of the workload is involved in mathematical 
and applied research in statistics, depending on the interests and capa- 
bilities of the professional statisticians and on the problems arising in 
applying statistics to aviation medicine. For instance, in the depart- 
ment, empirical sampling projects are under way concerning the relaxa- 
tion of assumptions in the analysis of variance. Due to the importance 
of multivariate analysis in medical research, contracts for mathematical 
statistical research have been let to the University of California for 
work on discriminatory analysis, under the direction of J. Neyman; 
and to Yale University, for work in compound symmetry tests under the 
direction of David Votaw. Dr. Rafferty, as a medical research theorist, 
is interested in gathering into the Department of Biometrics as col- 
leagues, mathematical statisticians and biomathematicians to work on 
mathematical models for biological and medical phenomena, to offer 
“cradle to grave” theory service to the experiments in the various basic 
medical research fields... . Allyn W. Kimball, Jr. has been with the 
Department of Biometrics in the capacity of experimental statistician 
since May, 1948. Mr. Kimball, a candidate for the Ph.D. degree in 
experimental statistics at the University of North Carolina, devotes 
much of his time to consultations with research workers in other depart- 
ments on problems of design of experiments and interpretation of results. 
The progress made toward building up respect and confidence for sta- 
tistics among the medical research personnel has been substantial and 
gratifying. In addition to these duties, Mr. Kimball conducts purely 
statistical research pertinent to aviation medicine and acts as project 
officer on contracts for statistical research awarded to civilian estab- 


lishments. 


